245

Suppose, you have a data.frame like this:

x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])

How would you select only those columns in x that are numeric?

12 Answers 12

382

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)  

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))
Sign up to request clarification or add additional context in comments.

8 Comments

x[nums] or x[sapply(x,is.numeric)] works as well. And they always return data.frame. Compare x[1] vs x[,1] - first is data.frame, second is a vector. If one want to prevent conversion then must use x[, 1, drop=FALSE] .
Any way to select continuous data only? This method returns continuous as well as integer.
When there is no numeric column, the following error arise undefined columns selected. How do you avoid it ?
@SoilSciGuy continuous data should be as.numeric. Perhaps you have factor data that's in numeric form? You should open a new question.
@YohanObadia You can use a tryCatch() to deal with this. Please consider opening a new question.
|
98

The dplyr package's select_if() function is an elegant solution:

library("dplyr")
select_if(x, is.numeric)

2 Comments

select_if has been superseded, any idea how to do this with the current version?
60

Filter() from the base package is the perfect function for that use-case: You simply have to code:

Filter(is.numeric, x)

It is also much faster than select_if():

library(microbenchmark)
microbenchmark(
    dplyr::select_if(mtcars, is.numeric),
    Filter(is.numeric, mtcars)
)

returns (on my computer) a median of 60 microseconds for Filter, and 21 000 microseconds for select_if (350x faster).

6 Comments

This solution doesn't fail when no numeric columns are present. Are there any drawbacks to using it?
Filter only applies to rows of a dataframe rather than columns. As such, this solution wouldn't give the correct result.
@Michael don't confuse Filter from the base package and filter from dplyr package!
@bli I can't see any drawback of using Filter. Its input is a data.frame object and it return a data.frame
Just chiming in here for reference: what Filter() doesn't work for here is replacing, e.g. Filter(is.numeric,iris) <- 0.5*Filter(is.numeric,iris) won't work.
|
9

in case you are interested only in column names then use this :

names(dplyr::select_if(train,is.numeric))

Comments

9
iris %>% dplyr::select(where(is.numeric)) #as per most recent updates

Another option with purrr would be to negate discard function:

iris %>% purrr::discard(~!is.numeric(.))

If you want the names of the numeric columns, you can add names or colnames:

iris %>% purrr::discard(~!is.numeric(.)) %>% names

1 Comment

negating discard() is pretty much the same as using keep().
8

This an alternate code to other answers:

x[, sapply(x, class) == "numeric"]

with a data.table

x[, lapply(x, is.numeric) == TRUE, with = FALSE]

2 Comments

This is more of a comment to the selected answer, rather than aunique answer.
Columns can have more than one class.
6
library(purrr)
x <- x %>% keep(is.numeric)

Comments

3

The library PCAmixdata has functon splitmix that splits quantitative(Numerical data) and qualitative (Categorical data) of a given dataframe "YourDataframe" as shown below:

install.packages("PCAmixdata")
library(PCAmixdata)
split <- splitmix(YourDataframe)
X1 <- split$X.quanti(Gives numerical columns in the dataset) 
X2 <- split$X.quali (Gives categorical columns in the dataset)

Comments

1

If you have many factor variables, you can use select_if funtion. install the dplyr packages. There are many function that separates data by satisfying a condition. you can set the conditions.

Use like this.

categorical<-select_if(df,is.factor)
str(categorical)

1 Comment

Looks like a duplicate of this earlier answer stackoverflow.com/a/40808873/170352
0

Another way could be as follows:-

#extracting numeric columns from iris datset
(iris[sapply(iris, is.numeric)])

1 Comment

Hi Ayushi, this probably was downvoted because it's a repeat of the first answer, but this method has some issues that were identified. Take a look at the comments in the first answer, you'll see what I mean.
0
Numerical_variables <- which(sapply(df, is.numeric))
# then extract column names 
Names <- names(Numerical_variables)

1 Comment

Hey Mo! I recommend you try this in the console because I don't think it's going to give you what you think it will.
-1

This doesn't directly answer the question but can be very useful, especially if you want something like all the numeric columns except for your id column and dependent variable.

numeric_cols <- sapply(dataframe, is.numeric) %>% which %>% 
                   names %>% setdiff(., c("id_variable", "dep_var"))

dataframe %<>% dplyr::mutate_at(numeric_cols, function(x) your_function(x))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.