Drop Multiple Columns in R

Question

I have a data of 80k rows and 874 columns. Some of these columns are empty. I use sum(is.na) in a for loop to determine the index of empty columns. Since the first column is not empty, if sum(is.na) is equal to the number of rows of the first column, it means that column is empty.

for (i in 1:ncol(loans)){
  if (sum(is.na(loans[i])) == nrow(loans[1])){
      print(i)
  }
}

Now that I know the indices of empty columns, I want to drop them from the data. I thought about storing those indices in an array and dropping them in a loop but I don't think it will work since columns with data will replace the empty columns. How can I drop them?

Fons MA · Accepted Answer · 2021-06-10 08:35:40Z

2

You should try to provide a toy dataset for your question.

loans <- data.frame(
  a = c(NA, NA, NA),
  b = c(1,2,3),
  c = c(1,2,3),
  d = c(1,2,3),
  e = c(NA, NA, NA)
)


loans[!sapply(loans, function(col) all(is.na(col)))]

sapply loops over columns of loans and applies the anonymous function checking if all elements are NA. It then coerces the output to a vector, in this case logical.

The tidyverse option:

loans[!purrr::map_lgl(loans, ~all(is.na(.x)))]

answered Jun 10, 2021 at 8:35

Fons MA

1,3291 gold badge15 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Karthik S · Accepted Answer · 2021-06-10 08:39:54Z

2

Does this work:

df <- data.frame(col1 = rep(NA, 5),
                 col2 = 1:5,
                 col3 = rep(NA,5),
                 col4 = 6:10)
df
  col1 col2 col3 col4
1   NA    1   NA    6
2   NA    2   NA    7
3   NA    3   NA    8
4   NA    4   NA    9
5   NA    5   NA   10
df[,which(colSums(df, na.rm = TRUE) == 0)] <- NULL
df
  col2 col4
1    1    6
2    2    7
3    3    8
4    4    9
5    5   10

Another approach:

df[!apply(df, 2, function(x) all(is.na(x)))]
  col2 col4
1    1    6
2    2    7
3    3    8
4    4    9
5    5   10

edited Jun 10, 2021 at 8:39

answered Jun 10, 2021 at 8:34

Karthik S

11.6k2 gold badges14 silver badges32 bronze badges

3 Comments

zerz Over a year ago

Wouldn't colSums(df, na.rm = TRUE) evaluate to 0 if everything would be 0 in the column?

Karthik S Over a year ago

@zerz, it would, but considering the OP's dataframe has 874 columns and 80k rows, the probability of that happening is very remote. Have added another approach to address the same.

Chris Ruehlemann Over a year ago

Also: df[-which(colSums(df, na.rm = TRUE) == 0)]

Chris Ruehlemann · Accepted Answer · 2021-06-10 08:57:35Z

1

A dplyr solution:

df %>%
  select_if(!colSums(., na.rm = TRUE) == 0)

answered Jun 10, 2021 at 8:57

Chris Ruehlemann

21.5k4 gold badges15 silver badges45 bronze badges

Comments

Jeremy · Accepted Answer · 2021-06-10 08:44:52Z

0

You can try to use fundamental skills like if else and for loops for almost all problems, although a drawback is that it will be slower.

# evaluate each column, if a column meets your condition, remove it, then next
for (i in 1:length(loans)){
  if (sum(is.na(loans[,i])) == nrow(loans)){
    loans[,i] <- NULL
  }
}

answered Jun 10, 2021 at 8:44

Jeremy

8868 silver badges18 bronze badges

3 Comments

Volkan Demir Over a year ago

The problem here is that if an empty columns is deleted, the next column with data is replacing it. Therefore, I guess, in each iteration there is a chance that you can delete columns with data.

Jeremy Over a year ago

@VolkanDemir Well, I get your concerns. But the "if" statement decides when a column should be removed or not, so no matter what, the columns with data won't be affected.

Jeremy Over a year ago

@VolkanDemir For your concern that the next column data is replacing the removed one. Actually I never thought about this before. If you use a small sample data to test my approach, you will know it actually works. But you raised a good point, I may probably post a question about that. Thanks!

Collectives™ on Stack Overflow

Drop Multiple Columns in R

4 Answers 4

Comments

3 Comments

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

Comments

3 Comments

Related