How can we select multiple columns using a vector of their numeric indices (position) in data.table?
This is how we would do with a data.frame:
df <- data.frame(a = 1, b = 2, c = 3)
df[ , 2:3]
# b c
# 1 2 3
For versions of data.table >= 1.9.8, the following all just work:
library(data.table)
dt <- data.table(a = 1, b = 2, c = 3)
# select single column by index
dt[, 2]
# b
# 1: 2
# select multiple columns by index
dt[, 2:3]
# b c
# 1: 2 3
# select single column by name
dt[, "a"]
# a
# 1: 1
# select multiple columns by name
dt[, c("a", "b")]
# a b
# 1: 1 2
For versions of data.table < 1.9.8 (for which numerical column selection required the use of with = FALSE), see this previous version of this answer. See also NEWS on v1.9.8, POTENTIALLY BREAKING CHANGES, point 3.
dt[,"a"] and dt[,"a", with=FALSE] to see what a helpful option it really is.DT[,list(b:c), as I found it convenient to transform the columns directly in the data table, e.g I can do DT[,list(1/b,2*c)], but this does not work with with.with=FALSE unnecessary in this case: github.com/Rdatatable/data.table/issues/…data.frame compatible way to use with=FALSE. However, as of about 3 weeks ago, the development version of data.table has been modified to calls like dt[, 2], dt[, 2:3], dt[, "b"], and dt[, c("b", "c")] behave the same as they do in the with data.frames without having to explicitly set with=FALSE. It's terrific! See here for the particular commit, including the NEWS entry describing the change.It's a bit verbose, but i've gotten used to using the hidden .SD variable.
b<-data.table(a=1,b=2,c=3,d=4)
b[,.SD,.SDcols=c(1:2)]
It's a bit of a hassle, but you don't lose out on other data.table features (I don't think), so you should still be able to use other important functions like join tables etc.
From v1.10.2 onwards, you can also use ..
dt <- data.table(a=1:2, b=2:3, c=3:4)
keep_cols = c("a", "c")
dt[, ..keep_cols]
dt[, !..keep_cols] and dt[, -..keep_cols] works as expected!.. is very limited. cols<-c(1:2); dt[x, ..cols] succeeds, but dt[, ..c(1:2)] fails.@Tom, thank you very much for pointing out this solution. It works great for me.
I was looking for a way to just exclude one column from printing and from the example above. To exclude the second column you can do something like this
library(data.table)
dt <- data.table(a=1:2, b=2:3, c=3:4)
dt[,.SD,.SDcols=-2]
dt[,.SD,.SDcols=c(1,3)]