Select multiple columns in data.table by their numeric indices

Question

How can we select multiple columns using a vector of their numeric indices (position) in data.table?

This is how we would do with a data.frame:

df <- data.frame(a = 1, b = 2, c = 3)
df[ , 2:3]
#   b c
# 1 2 3

Henrik · Accepted Answer · 2019-10-29 09:09:08Z

196

For versions of data.table >= 1.9.8, the following all just work:

library(data.table)
dt <- data.table(a = 1, b = 2, c = 3)

# select single column by index
dt[, 2]
#    b
# 1: 2

# select multiple columns by index
dt[, 2:3]
#    b c
# 1: 2 3

# select single column by name
dt[, "a"]
#    a
# 1: 1

# select multiple columns by name
dt[, c("a", "b")]
#    a b
# 1: 1 2

For versions of data.table < 1.9.8 (for which numerical column selection required the use of with = FALSE), see this previous version of this answer. See also NEWS on v1.9.8, POTENTIALLY BREAKING CHANGES, point 3.

edited Oct 29, 2019 at 9:09

Henrik

68k15 gold badges152 silver badges166 bronze badges

answered Nov 14, 2012 at 17:28

Josh O'Brien

163k29 gold badges380 silver badges465 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Josh O'Brien Over a year ago

No problem. Compare also dt[,"a"] and dt[,"a", with=FALSE] to see what a helpful option it really is.

jamborta Over a year ago

any way to do this without with? for example DT[,list(b:c), as I found it convenient to transform the columns directly in the data table, e.g I can do DT[,list(1/b,2*c)], but this does not work with with.

Frank Over a year ago

A change to the package will make with=FALSE unnecessary in this case: github.com/Rdatatable/data.table/issues/…

Josh O'Brien Over a year ago

@Frank -- That's great news! Thanks for bringing it to my attention. Once that change makes its way into the version of data.table distributed on CRAN, I'll edit this answer to announce the change up top. (And please -- you or anyone else who reads this -- feel free to ping me with a reminder as soon as that happens.)

Josh O'Brien Over a year ago

@Valentas Funny you should ask. There is not a data.frame compatible way to use with=FALSE. However, as of about 3 weeks ago, the development version of data.table has been modified to calls like dt[, 2], dt[, 2:3], dt[, "b"], and dt[, c("b", "c")] behave the same as they do in the with data.frames without having to explicitly set with=FALSE. It's terrific! See here for the particular commit, including the NEWS entry describing the change.

|

Artem Klevtsov · Accepted Answer · 2015-09-05 16:00:09Z

46

It's a bit verbose, but i've gotten used to using the hidden .SD variable.

b<-data.table(a=1,b=2,c=3,d=4)
b[,.SD,.SDcols=c(1:2)]

It's a bit of a hassle, but you don't lose out on other data.table features (I don't think), so you should still be able to use other important functions like join tables etc.

edited Sep 5, 2015 at 16:00

Artem Klevtsov

9,5016 gold badges55 silver badges57 bronze badges

answered May 5, 2015 at 20:28

Tom

1,3211 gold badge12 silver badges14 bronze badges

1 Comment

Chris Over a year ago

Not a hassle and very useful when creating the column list programmatically

Henrik · Accepted Answer · 2016-05-04 16:50:03Z

39

If you want to use column names to select the columns, simply use .(), which is an alias for list():

library(data.table)
dt <- data.table(a = 1:2, b = 2:3, c = 3:4)
dt[ , .(b, c)] # select the columns b and c
# Result:
#    b c
# 1: 2 3
# 2: 3 4

edited May 4, 2016 at 16:50

Henrik

68k15 gold badges152 silver badges166 bronze badges

answered Aug 16, 2015 at 20:57

R Yoda

8,8502 gold badges55 silver badges99 bronze badges

Comments

rafa.pereira · Accepted Answer · 2017-05-02 06:48:13Z

22

From v1.10.2 onwards, you can also use ..

dt <- data.table(a=1:2, b=2:3, c=3:4)

keep_cols = c("a", "c")

dt[, ..keep_cols]

answered May 2, 2017 at 6:48

rafa.pereira

13.9k6 gold badges77 silver badges119 bronze badges

2 Comments

IceCreamToucan Over a year ago

Thanks for this answer. I also found that dt[, !..keep_cols] and dt[, -..keep_cols] works as expected!

Ofek Shilon Over a year ago

Be aware that .. is very limited. cols<-c(1:2); dt[x, ..cols] succeeds, but dt[, ..c(1:2)] fails.

Bhoom Suktitipat · Accepted Answer · 2015-11-04 08:46:04Z

3

@Tom, thank you very much for pointing out this solution. It works great for me.

I was looking for a way to just exclude one column from printing and from the example above. To exclude the second column you can do something like this

library(data.table)
dt <- data.table(a=1:2, b=2:3, c=3:4)
dt[,.SD,.SDcols=-2]
dt[,.SD,.SDcols=c(1,3)]

answered Nov 4, 2015 at 8:46

Bhoom Suktitipat

2,2172 gold badges19 silver badges13 bronze badges

Collectives™ on Stack Overflow

Select multiple columns in data.table by their numeric indices

5 Answers 5

14 Comments

1 Comment

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

14 Comments

1 Comment

Comments

2 Comments

Comments

Linked

Related