0

I have a vector where each element is a string. I only want to keep the part of the string right before the '==' regardless of whether it is at the beginning of the string, after the & symbol, or after the | symbol. Here is my data:

data <- c("name=='John'", "name=='David'&age=='50'|job=='Doctor'&city=='Liverpool'", 
"job=='engineer'&name=='Andrew'", 
"city=='Manchester'", "age=='40'&city=='London'"
)

My ideal format would be something like this:

[1] "name"
[2] "name" "age" "job" "city"
[3] "job" "name"
[4] "city" 
[5] "age" "city"

The closest I have got is using genXtract from the qdap library, which puts the data in the format above, but I only know how to use it with one condition, i.e.

qdap::genXtract(data, "&", "==")

But I don't just want the part of the string between & and == but also between | and == or the beginning of the string and ==

2 Answers 2

2

What this regex does, is capture all a-zA-Z0-9 (=letters and numbers) before an occurence of ==.

stringr::str_extract_all( data, "[0-9a-zA-Z]+(?=(==))")

[[1]]
[1] "name"
[[2]]
[1] "name" "age"  "job"  "city"
[[3]]
[1] "job"  "name"
[[4]]
[1] "city"
[[5]]
[1] "age"  "city"

if you want the output as a vector, use

L <- stringr::str_extract_all( data, "[0-9a-zA-Z]+(?=(==))" )
unlist( lapply( L, paste, collapse = " " ) )

results in

[1] "name"             
[2] "name age job city"
[3] "job name"         
[4] "city"             
[5] "age city"  
Sign up to request clarification or add additional context in comments.

4 Comments

Is there anyway that this could also work if there are numbers because sometimes I have job1=="artist"&job2=="actor"
just a 0-9 between the square brackets..., see updated answer above
Also, on the other side, is there any way of saying before '==' or before '>=' ?
sure.. I suggest you do some reading about regexes ;-)
0

In base R, this can be done with regmatches/gregexpr

lst1 <- regmatches(data, gregexpr("\\w+(?=\\={2})", data, perl = TRUE))
sapply(lst1, paste, collapse = " ")
#[1] "name"     
#[2] "name age job city" 
#[3] "job name"       
#[4]  "city"      
#[5]  "age city"      

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.