1

I am currently writing a project, where I have to remove part of a string before and after. I have a attached an example in the bottom, and I am only able to use the packages stringr, tidyverse and dplyr. The different examples have different length, but I only need to keep the "r1" part or "r2". There is r1-4 for 96 different examples. Is anybody able to help me only keep this part af the variable. So I have a variable only containing of the r1, r2, r3 and r4.

[19] "data/r1-23-8-312.json"    "data/r1-23-8-66.json"     "data/r1-23-8-68.json"    
[22] "data/r1-23-8-85.json"     "data/r1-23-8-88.json"     "data/r2-65-12-200.json"  
[25] "data/r2-65-12-202.json"   "data/r2-65-12-214.json"   "data/r2-65-12-215.json"  

class(dat2$route)
[1] "character"

I have figured out, I can use "substr(dat2$route, 6, 7)", but if I use it this way:

dat2 <- substr(dat2$route, 6, 7)

It removes all the other variables beside route, how is that? Got 11 other variables as well.

0

3 Answers 3

2

There are several ways. If your character always starts with data/ you can do

library(tidyverse)
dat2 %>%
  mutate(new_route = str_sub(route, start = 6L, end = 7L))

Other options are to extract the 'r' followed by a number or to remove the data/ part and the stuff after the rX part. Plenty of options.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I fixed it now. Appreciated. New to R, so still learning :)
Please consider flagging one of the answers as your preferred/accepted and/or upvote useful answers one so that others can benefit from it.
0

If we want to be more strict we can use stringr::str_match() to capture an r followed by 1 to 4 between / and -.

The first column of matching will contain the whole match and the second the capture made by surrounding the pattern with parenthesis.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

data <- 
c("data/r1-23-8-312.json",    "data/r1-23-8-66.json",     "data/r1-23-8-68.json",    
"data/r1-23-8-85.json",     "data/r1-23-8-88.json",     "data/r2-65-12-200.json" , 
"data/r2-65-12-202.json",   "data/r2-65-12-214.json" ,  "data/r2-65-12-215.json") 

(matching <- stringr::str_match(data, '/(r[1-4])-'))
#>       [,1]   [,2]
#>  [1,] "/r1-" "r1"
#>  [2,] "/r1-" "r1"
#>  [3,] "/r1-" "r1"
#>  [4,] "/r1-" "r1"
#>  [5,] "/r1-" "r1"
#>  [6,] "/r2-" "r2"
#>  [7,] "/r2-" "r2"
#>  [8,] "/r2-" "r2"
#>  [9,] "/r2-" "r2"

matching[, 2]
#> [1] "r1" "r1" "r1" "r1" "r1" "r2" "r2" "r2" "r2"

Created on 2021-12-07 by the reprex package (v2.0.1)

1 Comment

But when all you want to do is extract the complete match, str_extract is nicely convenient and str_match is a little overcomplicated.
0
library(stringr)
str_extract(dat2$route, pattern = "r[0-9]")

4 Comments

Would probably add the stringr library command, just to be on the safe side.
Thank you, just fixed it. Appreciated.
@JBerggreen This code produces the result you want and prints it. It does not change your data dat2 at all. From your edits it seems like you maybe want to assign the result to a new column or perhaps a new object and you don't know how to do that? For that, use the assignment operator <-....
You can give it a new name as a separate object, my_result <- str_extract(...), or you can make it a new column in your data, dat2$my_new_column <- str_extract(...), or you can overwrite the old column replacing the values that were there, dat2$route <- str_extract(...). This will work the same for any of these answers. Or you can use mutate which adds columns to data frames, dat2 <- dat2 %>% mutate(new_column = str_extract(...)).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.