Split string after comma without trailing whitespace

Question

As the title already says, I want to split this string

strsplit(c("aaa,aaa", "bbb, bbb", "ddd , ddd"), ",")

to that

[[1]]
[1] "aaa" "aaa"

[[2]]
[1] "bbb, bbb"

[[3]]
[1] "ddd , ddd"

Thus, the regular expression has to consider that no whitespace should occur after the comma. Could be a dupe, but was not able to find a solution by googling.

The pattern has been already posted: stackoverflow.com/questions/19480101/… — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 16, 2018 at 13:22
Since the regex usage in R is rather different than the regex usage in Java, I agree it is not. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 16, 2018 at 16:47

Avinash Raj · Accepted Answer · 2018-04-16 13:23:21Z

5

regular expression has to consider that no whitespace should occur after the comma

Use negative lookahead assertion:

> strsplit(c("aaa,aaa", "bbb, bbb", "ddd , ddd"), ",(?!\\s)", perl = TRUE)
[[1]]
[1] "aaa" "aaa"

[[2]]
[1] "bbb, bbb"

[[3]]
[1] "ddd , ddd"

,(?!\\s) matches , only if it's not followed by a space

edited Apr 16, 2018 at 13:23

answered Apr 16, 2018 at 13:19

Avinash Raj

175k32 gold badges246 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Roman Over a year ago

Is there also a solution using perl = FALSE?

Wiktor Stribiżew Over a year ago

@Jimbou Why do you ask that? What OS are you working on?

Roman Over a year ago

@WiktorStribiżew more or less, just for curiosity. But I'm using the pattern within tidyr's separate_rows function. Fortunately it works as expected.

Wiktor Stribiżew Over a year ago

@Jimbou I see that the separate_rows relies on stringi package for splitting, so it is not surprising the lookaheads are supported by the ICU regex library.

Jan · Accepted Answer · 2018-04-16 14:42:50Z

0

Just to provide an alternative using (*SKIP)(*FAIL):

pattern <- " , (*SKIP)(*FAIL)|,"
data <- c("aaa,aaa", "bbb, bbb", "ddd , ddd")
strsplit(data, pattern, perl = T)

This yields the same as above.

answered Apr 16, 2018 at 14:42

Jan

43.3k11 gold badges57 silver badges87 bronze badges

Collectives™ on Stack Overflow

Split string after comma without trailing whitespace

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related