0

I have problems solving this issue. Let's assume a dataframe like this:

COL_1 COL_2           COL_3  COL_4
1     UP_RED_LIGHT    23.43  UP_R
2     UP_YELLOW_LIGHT 23.33  UP_Y
3     DP_GREEN_DARK   43.76  DP_G
4     DP_BROWN_LIGHT  45.65  DP_B
5     R_BLACK_DARK    12.32  R_B

I want to catch every string in this dataframe that starts with "DP_" and delete it from the string.

The result I want to have:

COL_1 COL_2           COL_3  COL_4
1     UP_RED_LIGHT    23.43  UP_R
2     UP_YELLOW_LIGHT 23.33  UP_Y
3     GREEN_DARK      43.76  G
4     BROWN_LIGHT     45.65  B
5     R_BLACK_DARK    12.32  R_B

So basically, I want to replace with '' whenever a string in my dataframe starts with DP_, in every column. The fact that starts is important, if it was in the middle of the string the solution should leave it. This is why solution like this:

df<- gsub('DP_', '', df)

don't work for me.

Is there a nice and clean solution to this?

Thank you in advance for the help.

2 Answers 2

1

Your use of sub is almost correct, except that you only want to remove DP_ at the beginning of the string, and also, you only want to do this to the COL_2 column:

df$COL_2 <- sub("^DP_", "", df$COL_2)

To do this replacement on one or more columns, e.g. on COL_2 and COL_4, we can try:

cols <- c("COL_2", "COL_4")
df[cols] <- lapply(df[cols], function(x) sub("^DP_", "", x))
Sign up to request clarification or add additional context in comments.

2 Comments

Since my columns are dynamically created, I can't actually tell which one will be. And as in the example COL_4 is changed too. 'df<- gsub('^DP_', '', df)' would this work for all columns?
@Luigi I have also given you an option which would work for multiple columns.
1

You can also use mutate_at and str_replace to get the desired output.

library(dplyr)
library(stringr)
df %>% 
    mutate_at(vars("COL_2", "COL_4"), ~ str_replace(., "DP_", ""))
 

#  COL_1           COL_2 COL_3 COL_4
#1     1    UP_RED_LIGHT 23.43  UP_R
#2     2 UP_YELLOW_LIGHT 23.33  UP_Y
#3     3      GREEN_DARK 43.76     G
#4     4     BROWN_LIGHT 45.65     B
#5     5    R_BLACK_DARK 12.32   R_B 

Data

df <- data.frame(COL_1 = c(1L:5L), COL_2 = c("UP_RED_LIGHT","UP_YELLOW_LIGHT", "DP_GREEN_DARK",
                "DP_BROWN_LIGHT","R_BLACK_DARK"), COL_3 = c(23.43,23.33,43.76,45.65,12.32),
                COL_4 = c("UP_R", "UP_Y", "DP_G", "DP_B", "R_B"))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.