3

As the title already says, I want to split this string

strsplit(c("aaa,aaa", "bbb, bbb", "ddd , ddd"), ",")

to that

[[1]]
[1] "aaa" "aaa"

[[2]]
[1] "bbb, bbb"

[[3]]
[1] "ddd , ddd"

Thus, the regular expression has to consider that no whitespace should occur after the comma. Could be a dupe, but was not able to find a solution by googling.

3
  • 2
    The pattern has been already posted: stackoverflow.com/questions/19480101/… Commented Apr 16, 2018 at 13:22
  • @WiktorStribiżew Only once? Then it is no dupe ;) Commented Apr 16, 2018 at 13:25
  • Since the regex usage in R is rather different than the regex usage in Java, I agree it is not. Commented Apr 16, 2018 at 16:47

2 Answers 2

5

regular expression has to consider that no whitespace should occur after the comma

Use negative lookahead assertion:

> strsplit(c("aaa,aaa", "bbb, bbb", "ddd , ddd"), ",(?!\\s)", perl = TRUE)
[[1]]
[1] "aaa" "aaa"

[[2]]
[1] "bbb, bbb"

[[3]]
[1] "ddd , ddd"

,(?!\\s) matches , only if it's not followed by a space

Sign up to request clarification or add additional context in comments.

4 Comments

Is there also a solution using perl = FALSE?
@Jimbou Why do you ask that? What OS are you working on?
@WiktorStribiżew more or less, just for curiosity. But I'm using the pattern within tidyr's separate_rows function. Fortunately it works as expected.
@Jimbou I see that the separate_rows relies on stringi package for splitting, so it is not surprising the lookaheads are supported by the ICU regex library.
0

Just to provide an alternative using (*SKIP)(*FAIL):

pattern <- " , (*SKIP)(*FAIL)|,"
data <- c("aaa,aaa", "bbb, bbb", "ddd , ddd")
strsplit(data, pattern, perl = T)

This yields the same as above.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.