Selecting only numeric columns from a data frame

Question

Suppose, you have a data.frame like this:

x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])

How would you select only those columns in x that are numeric?

mdsumner · Accepted Answer · 2022-05-16 11:30:43Z

382

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

edited May 16, 2022 at 11:30

answered May 2, 2011 at 22:28

mdsumner

29.5k6 gold badges85 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Marek Over a year ago

x[nums] or x[sapply(x,is.numeric)] works as well. And they always return data.frame. Compare x[1] vs x[,1] - first is data.frame, second is a vector. If one want to prevent conversion then must use x[, 1, drop=FALSE] .

derelict Over a year ago

Any way to select continuous data only? This method returns continuous as well as integer.

Yohan Obadia Over a year ago

When there is no numeric column, the following error arise undefined columns selected. How do you avoid it ?

Brandon Bertelsen Over a year ago

@SoilSciGuy continuous data should be as.numeric. Perhaps you have factor data that's in numeric form? You should open a new question.

Brandon Bertelsen Over a year ago

@YohanObadia You can use a tryCatch() to deal with this. Please consider opening a new question.

|

Sharon · Accepted Answer · 2016-11-25 16:08:16Z

98

The dplyr package's select_if() function is an elegant solution:

library("dplyr")
select_if(x, is.numeric)

answered Nov 25, 2016 at 16:08

Sharon

3,8063 gold badges26 silver badges21 bronze badges

2 Comments

Max F Over a year ago

select_if has been superseded, any idea how to do this with the current version?

bryn Over a year ago

select(where(is.numeric)) tidyselect.r-lib.org/reference/where.html

Kevin Zarca · Accepted Answer · 2019-10-02 09:13:24Z

60

Filter() from the base package is the perfect function for that use-case: You simply have to code:

Filter(is.numeric, x)

It is also much faster than select_if():

library(microbenchmark)
microbenchmark(
    dplyr::select_if(mtcars, is.numeric),
    Filter(is.numeric, mtcars)
)

returns (on my computer) a median of 60 microseconds for Filter, and 21 000 microseconds for select_if (350x faster).

edited Oct 2, 2019 at 9:13

answered Nov 9, 2016 at 10:31

Kevin Zarca

2,7321 gold badge20 silver badges18 bronze badges

6 Comments

bli Over a year ago

This solution doesn't fail when no numeric columns are present. Are there any drawbacks to using it?

Michael Over a year ago

Filter only applies to rows of a dataframe rather than columns. As such, this solution wouldn't give the correct result.

Kevin Zarca Over a year ago

@Michael don't confuse Filter from the base package and filter from dplyr package!

Kevin Zarca Over a year ago

@bli I can't see any drawback of using Filter. Its input is a data.frame object and it return a data.frame

Mobeus Zoom Over a year ago

Just chiming in here for reference: what Filter() doesn't work for here is replacing, e.g. Filter(is.numeric,iris) <- 0.5*Filter(is.numeric,iris) won't work.

|

user3065757 · Accepted Answer · 2018-04-05 09:44:54Z

9

in case you are interested only in column names then use this :

names(dplyr::select_if(train,is.numeric))

answered Apr 5, 2018 at 9:44

user3065757

5031 gold badge5 silver badges15 bronze badges

Comments

AlexB · Accepted Answer · 2020-10-20 09:38:54Z

9

iris %>% dplyr::select(where(is.numeric)) #as per most recent updates

Another option with purrr would be to negate discard function:

iris %>% purrr::discard(~!is.numeric(.))

If you want the names of the numeric columns, you can add names or colnames:

iris %>% purrr::discard(~!is.numeric(.)) %>% names

edited Oct 20, 2020 at 9:38

answered Oct 20, 2020 at 7:30

AlexB

3,2812 gold badges22 silver badges23 bronze badges

1 Comment

GuedesBF Over a year ago

negating discard() is pretty much the same as using keep().

Enrique Pérez Herrero · Accepted Answer · 2016-11-13 19:54:45Z

8

This an alternate code to other answers:

x[, sapply(x, class) == "numeric"]

with a data.table

x[, lapply(x, is.numeric) == TRUE, with = FALSE]

edited Nov 13, 2016 at 19:54

answered Nov 13, 2016 at 16:11

Enrique Pérez Herrero

3,9122 gold badges37 silver badges36 bronze badges

2 Comments

Brandon Bertelsen Over a year ago

This is more of a comment to the selected answer, rather than aunique answer.

Rich Scriven Over a year ago

Columns can have more than one class.

Brandon Bertelsen · Accepted Answer · 2020-03-23 18:28:22Z

6

library(purrr)
x <- x %>% keep(is.numeric)

edited Mar 23, 2020 at 18:28

Brandon Bertelsen

44.8k37 gold badges170 silver badges261 bronze badges

answered Mar 23, 2020 at 15:50

Yash Khokale

611 silver badge1 bronze badge

Comments

Krishna · Accepted Answer · 2017-11-13 15:48:57Z

3

The library PCAmixdata has functon splitmix that splits quantitative(Numerical data) and qualitative (Categorical data) of a given dataframe "YourDataframe" as shown below:

install.packages("PCAmixdata")
library(PCAmixdata)
split <- splitmix(YourDataframe)
X1 <- split$X.quanti(Gives numerical columns in the dataset) 
X2 <- split$X.quali (Gives categorical columns in the dataset)

edited Nov 13, 2017 at 15:48

answered Nov 13, 2017 at 15:42

Krishna

4215 silver badges30 bronze badges

Comments

서영재 · Accepted Answer · 2017-01-06 00:19:05Z

1

If you have many factor variables, you can use select_if funtion. install the dplyr packages. There are many function that separates data by satisfying a condition. you can set the conditions.

Use like this.

categorical<-select_if(df,is.factor)
str(categorical)

answered Jan 6, 2017 at 0:19

서영재

1062 silver badges9 bronze badges

1 Comment

Brandon Bertelsen Over a year ago

Looks like a duplicate of this earlier answer stackoverflow.com/a/40808873/170352

greg-449 · Accepted Answer · 2018-10-09 07:04:22Z

0

Another way could be as follows:-

#extracting numeric columns from iris datset
(iris[sapply(iris, is.numeric)])

edited Oct 9, 2018 at 7:04

greg-449

112k235 gold badges112 silver badges164 bronze badges

answered Oct 9, 2018 at 6:00

Ayushi

91 bronze badge

1 Comment

Brandon Bertelsen Over a year ago

Hi Ayushi, this probably was downvoted because it's a repeat of the first answer, but this method has some issues that were identified. Take a look at the comments in the first answer, you'll see what I mean.

Brandon Bertelsen · Accepted Answer · 2020-07-31 01:10:28Z

0

Numerical_variables <- which(sapply(df, is.numeric))
# then extract column names 
Names <- names(Numerical_variables)

edited Jul 31, 2020 at 1:10

Brandon Bertelsen

44.8k37 gold badges170 silver badges261 bronze badges

answered Jul 30, 2020 at 21:08

Mohamed Ali Hefnawy

398 bronze badges

1 Comment

Brandon Bertelsen Over a year ago

Hey Mo! I recommend you try this in the console because I don't think it's going to give you what you think it will.

RJMCMC · Accepted Answer · 2018-03-29 16:32:46Z

-1

This doesn't directly answer the question but can be very useful, especially if you want something like all the numeric columns except for your id column and dependent variable.

numeric_cols <- sapply(dataframe, is.numeric) %>% which %>% 
                   names %>% setdiff(., c("id_variable", "dep_var"))

dataframe %<>% dplyr::mutate_at(numeric_cols, function(x) your_function(x))

answered Mar 29, 2018 at 16:32

RJMCMC

1

Collectives™ on Stack Overflow

Selecting only numeric columns from a data frame

12 Answers 12

8 Comments

2 Comments

6 Comments

Comments

1 Comment

2 Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

8 Comments

2 Comments

6 Comments

Comments

1 Comment

2 Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Comments

Linked

Related