28

I have the following data.table (DT):

DT <- data.table(V1 = 1:3, V2 = 4:6, V3 = 7:9)

I would like to select a subset of the variables programmatically (dynamically), by using an object where the relevant variable names are stored. For example, I want to select the two columns "V1" and "V3" stored in a variable "keep"

keep <- c("V1", "V3")

If we were to select the "keep" columns from a data.frame, the following would work:

DT[keep]

Unfortunately, this is not working when this is a data.table. I thought the data.frame and data.table are identical with this kind of behavior, but apperently they aren't. Anybody able to advise on the correct syntax?

2 Answers 2

37

This is covered in FAQ 1.1, 1.2 and 2.17.

Some possibilities:

DT[, keep, with = FALSE]
DT[, c('V1', 'V3'), with = FALSE]
DT[, c(1, 3), with = FALSE]
DT[, list(V1, V3)]

The reason DF[c('V1','V3')] works as it does for a data.frame is covered in ?`[.data.frame`

Data frames can be indexed in several modes. When [ and [[ are used with a single vector index (x[i] or x[[i]]), they index the data frame as if it were a list. In this usage a drop argument is ignored, with a warning.


From data.table 1.10.2, you may use the .. prefix when subsetting columns programmatically:

When j is a symbol prefixed with .. it will be looked up in calling scope and its value taken to be column names or numbers [...] It is experimental.

Thus:

DT[ , ..keep]
#    V1 V3
# 1:  1  7
# 2:  2  8
# 3:  3  9
Sign up to request clarification or add additional context in comments.

Comments

3

Some more possibilities:

DT[, .SD, .SDcols = keep]
DT[, mget(keep)]

1 Comment

I'm glad mget() works this way, but I am maybe a little confused on how data.table handles this. Say I want to do it dynamically: dt2[,names(dt1)] just returns the equivalent of names(dt1) (a list of column names in dt1), but dt2[,mget(names(dt1))] will return the subset of dt2 (as in the data.table object, columns and rows) with those column names (unless one of the columns in dt1 is not in dt2, in which case it throws Error: value for ‘missing_column’ not found). I just find it curious.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.