0

I have some strings and I'd like to convert each string in a number, so I'd like to use regular expression. My strings can be one of like:

["star"]
["near-star"]
["shared"]
["near-shared"]
["complete"]
["near-complete"]
["null"]
["near-null"]

my problem is that both these statements are true:

> grepl("star", "[\"near-star\"]")
[1] TRUE
> grepl("near-star", "[\"near-star\"]")
[1] TRUE

and this applies also to the other labels... any advice on how to write the right code to match each label is much appreciated.

best regards, Simone

3
  • It seems to me that this question calls for a regex tutorial, not a simple answer. Commented Feb 5, 2014 at 18:17
  • 1
    What do you mean by "convert each string in a number"? I don't see any numbers... If you want to convert N different strings to the numbers 1 to N then you can go via factors... Commented Feb 5, 2014 at 18:19
  • You can test for absolute string equality with ==. "star" == "near-star" returns FALSE. Commented Feb 5, 2014 at 18:22

3 Answers 3

3

Trying to answer what I think might be your real problem (convert each string "to" a number)...

Given data:

> strings = c('["star"]', '["near-stat"]', '["shared"]', '["near-shared"]')
> data = sample(strings,20,TRUE)

such that:

> head(data)
[1] "[\"near-stat\"]"   "[\"star\"]"        "[\"near-shared\"]"
[4] "[\"near-shared\"]" "[\"shared\"]"      "[\"star\"]"       

Simply do:

> dataf=factor(data)
> as.numeric(dataf)
 [1] 2 4 1 1 3 4 1 2 2 1 2 3 4 4 3 4 4 1 1 4

the mapping being given by:

> levels(dataf)
[1] "[\"near-shared\"]" "[\"near-stat\"]"   "[\"shared\"]"     
[4] "[\"star\"]"       
Sign up to request clarification or add additional context in comments.

Comments

2

Others have mentioned just using factors or the fixed argument (either of which will work fine for your stated question). But in general if you want to match a string or pattern, but only if it is not preceded by a given string then you can use negative look behind, an extension in Perl regular expressions:

> test <- c('star','near-star')
> grepl('(?<!near-)star', test, perl=TRUE )
[1]  TRUE FALSE

The regular expression here say to match the string "star", but only if not preceded by the string "near-". The help page ?regexp has details (you need to scroll almost all the way to the bottom).

Comments

1

You can include the square brackets and quotes in your pattern. Furthermore, you can use fixed = TRUE for matching the string as is.

> grepl("[\"star\"]", "[\"near-star\"]", fixed = TRUE)
[1] FALSE
> grepl("[\"star\"]", "[\"star\"]", fixed = TRUE)
[1] TRUE

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.