64

I have this dataframe:

state county city  region  mmatrix  X1 X2 X3    A1     A2     A3      B1     B2     B3      C1      C2      C3

  1      1     1      1     111010   1  0  0     2     20    200       Push      8     12      NA      NA      NA
  1      2     1      1     111010   1  0  0     4     NA    400       Shove      9     NA 

Now I want to exclude columns whose names end with a certain string, say "1" (i.e. A1 and B1). I wrote this code:

df_redacted <- df[, -grep("\\1$", colnames(df))]

However, this seems to delete every column. How can I modify the code so that it only deletes the columns that matches the pattern (i.e. ends with "3" or any other string)?

The solution has to be able to handle a dataframe with has both numerical and categorical values.

8 Answers 8

99

I found a simple answer using dplyr/tidyverse. If your colnames contain "This", then all variables containing "This" will be dropped.

library(dplyr) 
df_new <- df %>% select(-contains("This"))
Sign up to request clarification or add additional context in comments.

5 Comments

If I need to drop columns contains this or that? I use df %>% select(-contains("this|that")), it seems not working.
You are close: df %>% select(-(contains("this") | contains("that")))
Or df %>% select(-contains(c('this', 'that'))) works as well.
Hi @SamuelSaari, how to remove columns which start with this or that?
You can try: select(-startsWith(-c("this", "that" )). It will delete only the columns that begin with the string 'this' or 'that'..
53

Your code works like a charm if I apply it to a minimal example and just search for the string "A":

df <- data.frame(ID = 1:10,
                 A1 = rnorm(10),
                 A2 = rnorm(10),
                 B1 = letters[1:10],
                 B2 = letters[11:20])
df[, -grep("A", colnames(df))]

So your problem is more a regular expression problem, not how to drop columns. If I run your code, I get an error:

df[, -grep("\\3$", colnames(df))]
Error in grep("\\3$", colnames(df)) : 
  invalid regular expression '\3$', reason 'Invalid back reference'

Update: Why don't you just use this following expression?

df[, -grep("1$", colnames(df))]
   ID         A2 B2
1   1  2.0957940  k
2   2 -1.7177042  l
3   3 -0.0448357  m
4   4  1.2899925  n
5   5  0.7569659  o
6   6 -0.5048024  p
7   7  0.6929080  q
8   8 -0.5116399  r
9   9 -1.2621066  s
10 10  0.7664955  t

3 Comments

.@SimonO'Hanlon - I am using grepl() as data <- data [, !grepl("Unique-",names(data))] to remove column where column name starts with Unique-. I see that grepl() is appending duplicate column name with .1,.2,.3 etc. Is it possible to use grepl() without affecting column name even if there are duplicates in the data frame?
When running df[, -grep("A", colnames(df))], if no match is found grep returns 0 and a data.frame with zero columns is returned.
The regex can be fixed just by getting rid of the backslashes: grep("3$", colnames(df)). As for @min
17

Just as an additional answer, since I stumbled across this, when looking for the data.table solution to this problem.

library(data.table)
dt <- data.table(df)
drop.cols <- grep("1$", colnames(dt))
dt[, (drop.cols) := NULL]

Comments

6

For excluding any string you can use...

 # Search string to exclude
 strng <- "1"
 df <- data.frame(matrix(runif(25,max=10),nrow=5))
 colnames(df) <- paste( "EX" , 1:5 )
 df_red <- df[, -( grep(paste0( strng , "$" ) , colnames(df),perl = TRUE) ) ]

    df
#         EX 1     EX 2        EX 3     EX 4     EX 5
#   1 7.332913 4.972780 1.175947853 6.428073 8.625763
#   2 2.730271 3.734072 6.031157537 1.305951 8.012606
#   3 9.450122 3.259247 2.856123205 5.067294 7.027795
#   4 9.682430 5.295177 0.002015966 9.322912 7.424568
#   5 1.225359 1.577659 4.013616377 5.092042 5.130887

    df_red
#         EX 2        EX 3     EX 4     EX 5
#   1 4.972780 1.175947853 6.428073 8.625763
#   2 3.734072 6.031157537 1.305951 8.012606
#   3 3.259247 2.856123205 5.067294 7.027795
#   4 5.295177 0.002015966 9.322912 7.424568
#   5 1.577659 4.013616377 5.092042 5.130887

2 Comments

Sorry for the inconsistency. The point is that I want to be able to specify any string that is the end of a colname, and then delete all columns that has that string at the end. So for this example data "1", "3", "ity", "ion", & "rix" would all be valid examples.
Yes - as long as it works with other strings as well. For example, in one dataframe I need to delete every column that ends with the string "timestamps"
4

If you are specifically looking for a pattern that appears at the end of the column name, to drop those columns, you can use the following command:

library(dplyr) 
df_new <- df %>% select(-ends_with("linear"))

All the columns that end with the string linear will be dropped.

2 Comments

Hi @sandy, the columns start with X or NA?
Did you mean removing all columns with names beginning with X or NA? You can try this : df %>% select(-(startsWith("X") | startsWith("NA")))
1

If you're using data.table, try the built-in data.table::patterns function:

df <- data.table(ID = 1:10, A1 = rnorm(10), B2 = head(LETTERS, 10))
df[, .SD, .SDcols = !patterns("1$")]

Comments

0

You can expand it further using regex for a broader pattern search. I have a data frame that has a bunch of columns with "name", "upper_name"and"lower_name"` as they represent confidence intervals for a bunch of series, but I don't need them all. So, using regex, you can do the following:

pattern = "(upper_[a-z]*)|(lower_[a-z]*)"
policyData <- policyData[, -grep(pattern = pattern, colnames(policyData))]

The "|" allows me to include an or statement in the regex so I can do it once with a single patter rather than look for each pattern.

Comments

0

If the first solution is failing for you with the following error:

Error in select(., -contains("unknown")) : 
unused argument (-contains("unknown"))

Your R may be be trying to use 'select' from the MASS module. To fix this, use

new_df <- df %>% dplyr::select(-contains("unknown"))

(Would have said this in a comment, but not enough rep yet)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.