Remove a specific part of a string in R with stringr

Question

I am currently writing a project, where I have to remove part of a string before and after. I have a attached an example in the bottom, and I am only able to use the packages stringr, tidyverse and dplyr. The different examples have different length, but I only need to keep the "r1" part or "r2". There is r1-4 for 96 different examples. Is anybody able to help me only keep this part af the variable. So I have a variable only containing of the r1, r2, r3 and r4.

[19] "data/r1-23-8-312.json"    "data/r1-23-8-66.json"     "data/r1-23-8-68.json"    
[22] "data/r1-23-8-85.json"     "data/r1-23-8-88.json"     "data/r2-65-12-200.json"  
[25] "data/r2-65-12-202.json"   "data/r2-65-12-214.json"   "data/r2-65-12-215.json"  

class(dat2$route)
[1] "character"

I have figured out, I can use "substr(dat2$route, 6, 7)", but if I use it this way:

dat2 <- substr(dat2$route, 6, 7)

It removes all the other variables beside route, how is that? Got 11 other variables as well.

deschen · Accepted Answer · 2021-12-07 15:11:15Z

2

There are several ways. If your character always starts with data/ you can do

library(tidyverse)
dat2 %>%
  mutate(new_route = str_sub(route, start = 6L, end = 7L))

Other options are to extract the 'r' followed by a number or to remove the data/ part and the stuff after the rX part. Plenty of options.

answered Dec 7, 2021 at 15:11

deschen

11.6k5 gold badges32 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JBerggreen Over a year ago

Thank you, I fixed it now. Appreciated. New to R, so still learning :)

deschen Over a year ago

Please consider flagging one of the answers as your preferred/accepted and/or upvote useful answers one so that others can benefit from it.

jpdugo17 · Accepted Answer · 2021-12-07 15:30:39Z

If we want to be more strict we can use stringr::str_match() to capture an r followed by 1 to 4 between / and -.

The first column of matching will contain the whole match and the second the capture made by surrounding the pattern with parenthesis.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

data <- 
c("data/r1-23-8-312.json",    "data/r1-23-8-66.json",     "data/r1-23-8-68.json",    
"data/r1-23-8-85.json",     "data/r1-23-8-88.json",     "data/r2-65-12-200.json" , 
"data/r2-65-12-202.json",   "data/r2-65-12-214.json" ,  "data/r2-65-12-215.json") 

(matching <- stringr::str_match(data, '/(r[1-4])-'))
#>       [,1]   [,2]
#>  [1,] "/r1-" "r1"
#>  [2,] "/r1-" "r1"
#>  [3,] "/r1-" "r1"
#>  [4,] "/r1-" "r1"
#>  [5,] "/r1-" "r1"
#>  [6,] "/r2-" "r2"
#>  [7,] "/r2-" "r2"
#>  [8,] "/r2-" "r2"
#>  [9,] "/r2-" "r2"

matching[, 2]
#> [1] "r1" "r1" "r1" "r1" "r1" "r2" "r2" "r2" "r2"

^{Created on 2021-12-07 by the reprex package (v2.0.1)}

But when all you want to do is extract the complete match, str_extract is nicely convenient and str_match is a little overcomplicated.

Gregor Thomas · Accepted Answer · 2021-12-07 16:00:54Z

0

library(stringr)
str_extract(dat2$route, pattern = "r[0-9]")

edited Dec 7, 2021 at 16:00

answered Dec 7, 2021 at 15:11

Gregor Thomas

147k22 gold badges185 silver badges320 bronze badges

4 Comments

deschen Over a year ago

Would probably add the stringr library command, just to be on the safe side.

JBerggreen Over a year ago

Thank you, just fixed it. Appreciated.

Gregor Thomas Over a year ago

@JBerggreen This code produces the result you want and prints it. It does not change your data dat2 at all. From your edits it seems like you maybe want to assign the result to a new column or perhaps a new object and you don't know how to do that? For that, use the assignment operator <-....

Gregor Thomas Over a year ago

You can give it a new name as a separate object, my_result <- str_extract(...), or you can make it a new column in your data, dat2$my_new_column <- str_extract(...), or you can overwrite the old column replacing the values that were there, dat2$route <- str_extract(...). This will work the same for any of these answers. Or you can use mutate which adds columns to data frames, dat2 <- dat2 %>% mutate(new_column = str_extract(...)).

Collectives™ on Stack Overflow

Remove a specific part of a string in R with stringr

3 Answers 3

2 Comments

1 Comment

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

4 Comments

Related