346

I would like to remove specific characters from strings within a vector, similar to the Find and Replace feature in Excel.

Here are the data I start with:

group <- data.frame(c("12357e", "12575e", "197e18", "e18947")

I start with just the first column; I want to produce the second column by removing the e's:

group       group.no.e
12357e      12357
12575e      12575
197e18      19718
e18947      18947

8 Answers 8

495

With a regular expression and the function gsub():

group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"

gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"

What gsub does here is to replace each occurrence of "e" with an empty string "".


See ?regexp or gsub for more help.

Sign up to request clarification or add additional context in comments.

7 Comments

fixed = TRUE would make this faster.
@RichScriven could you shortly elaborate why?
fixed=TRUE prevents R from using regular expressions, which allow more flexible pattern matching but take time to compute. If all that's needed is removing a single constant string "e", they aren't necessary.
Would sub("e", "", group) hold the same result?
would just replace the first e it finds in each element
|
58

Regular expressions are your friends:

R> ## also adds missing ')' and sets column name
R> group<-data.frame(group=c("12357e", "12575e", "197e18", "e18947"))  )
R> group
   group
1 12357e
2 12575e
3 197e18
4 e18947

Now use gsub() with the simplest possible replacement pattern: empty string:

R> group$groupNoE <- gsub("e", "", group$group)
R> group
   group groupNoE
1 12357e    12357
2 12575e    12575
3 197e18    19718
4 e18947    18947
R> 

4 Comments

Also...require(stringr);group$groupNoE <- str_replace(group$group, "e", "")
Well, I could snicker that "Those who do not understand base functions are doomed to replace them". Exactly what does stringr gain here, besides increasing the number of underscores in your source file?
"stringr is a set of simple wrappers that make R's string functions more consistent, simpler and easier to use" from the author of the package. So if what you say is true (many underscores to wrap base functions...) there is no reason for this package to exist (disclaimer : I mainly use base regex functions but I know that they can be difficult for new users...)
@dickoa: str_replace wraps sub, so it will only replace the first occurrence of the pattern. You would need to use str_replace_all if you wanted the same behavior as gsub.
41

Summarizing 2 ways to replace strings:

group<-data.frame(group=c("12357e", "12575e", "197e18", "e18947"))

1) Use gsub

group$group.no.e <- gsub("e", "", group$group)

2) Use the stringr package

group$group.no.e <- str_replace_all(group$group, "e", "")

Both will produce the desire output:

   group group.no.e
1 12357e      12357
2 12575e      12575
3 197e18      19718
4 e18947      18947

4 Comments

At the time you had to read the whole page including comments to learn the syntax for stringr, my preferred method, as it was mostly discussed in comments. This solution quickly presents both options, which is why I offered it. My hope was to help other users filter through much like I had to do when I was new to R. I struggled with gsub before finding stringr because it wasn't mentioned in a highly upvoted answer. Again, the objective is not to collect upvotes but try to help new R users out.
If you find information in other answers/comments which you find useful and like to convert to an answer, you could at least provide some attribution to show where did you get the information from / make the answer a Comminuty Wiki instead of just making it as your own.
Thanks - will keep in mind for next time. Have never made a community wiki before, so didn't know it was an option.
Option 2 works great when applied to a column of data in a data frame, without specifying all the values in the column. Obviously option 1 is a repeat, but option 2 works very well, and deserves an up-vote for the added functionality.
26

You do not need to create data frame from vector of strings, if you want to replace some characters in it. Regular expressions is good choice for it as it has been already mentioned by @Andrie and @Dirk Eddelbuettel.

Pay attention, if you want to replace special characters, like dots, you should employ full regular expression syntax, as shown in example below:

ctr_names <- c("Czech.Republic","New.Zealand","Great.Britain")
gsub("[.]", " ", ctr_names)

this will produce

[1] "Czech Republic" "New Zealand"    "Great Britain" 

1 Comment

You can just escape them, but you have to escape as well the escape character because it's in quotes : gsub("\\.", " ", ctr_names)
7

Use the stringi package:

require(stringi)

group<-data.frame(c("12357e", "12575e", "197e18", "e18947"))
stri_replace_all(group[,1], "", fixed="e")
[1] "12357" "12575" "19718" "18947"

Comments

2

You can use chartr as well:

group$group.no.e <- chartr("e", "", group$group)

Comments

0
> library(stringi)                
> group <- c('12357e', '12575e', '12575e', ' 197e18',  'e18947')              
> pattern <- "e"  
> replacement <-  ""  
> group <- str_replace(group, pattern, replacement)      
> group 
[1] "12357"  "12575"  "12575"  " 19718" "18947" 

Comments

0

You can use gsub or stringr.

Or, this:

library (magrittr); 

#' @author y.ypa.yhm
#' @license agpl-3.0
#' 

char.apart = 
function (str) str %>% nchar %>% {.+1} %>% seq %>% sample(1) %>% intToUtf8 %>% 
    {if (! (. %in% strsplit(str,"")[[1]])) . else char.apart (str)} ;

strtr = `%strtr%` = 
function (old, new) 
function (strs) (\ (rchar) strs %>% 
    paste0 (rchar) %>% strsplit(old) %>% 
    lapply (\ (s) s %>% paste (collapse = new)) %>% 
    unlist %>% substr(., 0, nchar(.) - 1) %>% 
    `names<-` (strs) 
    ) (old %>% char.apart) ;

#' @examples
#' 
#' `c("aaa bbb CCC ddd bb CC", "bb CC eee 1bb CCC PPP") %>% ("bb CC" %strtr% "tt TT")`
#' 
#' should out: 
#'   aaa bbb CCC ddd bb CC   bb CC eee 1bb CCC PPP 
#' "aaa btt TTC ddd tt TT" "tt TT eee 1tt TTC PPP"
#' 

use like:

c("12357e"
, "12575e"
, "197e18"
, "e18947") %>% 
    
    ("e" %strtr% "")

out:

 12357e  12575e  197e18  e18947 
"12357" "12575" "19718" "18947"

This way have no regex feature, and need no more libraries (you can just replace the magrittr pipe to native pipe).


Tested on webR REPL app

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.