Split on first comma in string

Question

How can I efficiently split the following string on the first comma using base?

x <- "I want to split here, though I don't want to split elsewhere, even here."
strsplit(x, ???)

Desired outcome (2 strings):

[[1]]
[1] "I want to split here"   "though I don't want to split elsewhere, even here."

Thank you in advance.

EDIT: Didn't think to mention this. This needs to be able to generalize to a column, vector of strings like this, as in:

y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")

The outcome can be two columns or one long vector (that I can take every other element of) or a list of stings with each index ([[n]]) having two strings.

Apologies for the lack of clarity.

extremely hacky, but what about something like list(head(y[[1]],1), paste(tail(y[[1]],-1), collapse = ",")) where y is the output of strsplit(x, ...)? — Chase
– Chase, Commented Apr 25, 2012 at 4:08
Chase I tried it but couldn't seem to get it to work for a vector of similar strings. I edited my original post to further explain the problem. — Tyler Rinker
– Tyler Rinker, Commented Apr 25, 2012 at 4:17
the str_locate_all(string=y, ',') will find all index locations of your pattern (comma in your case) which can then be applied to select out of vector or column. — John
– John, Commented Apr 25, 2012 at 4:23

Josh O'Brien · Accepted Answer · 2012-04-25 15:22:58Z

13

Here's what I'd probably do. It may seem hacky, but since sub() and strsplit() are both vectorized, it will also work smoothly when handed multiple strings.

XX <- "SoMeThInGrIdIcUlOuS"
strsplit(sub(",\\s*", XX, x), XX)
# [[1]]
# [1] "I want to split here"                               
# [2] "though I don't want to split elsewhere, even here."

edited Apr 25, 2012 at 15:22

answered Apr 25, 2012 at 4:23

Josh O'Brien

163k29 gold badges380 silver badges465 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

John Over a year ago

@josh-obrien How would you extend that code to trim the leading space in [2].

Tyler Rinker Over a year ago

I'll wrap it with gsub("^\\s+|\\s+$", "", JOSH's STUFF)

Tyler Rinker Over a year ago

I like it Josh. It works and is pretty simple and stays in base. thank you. +1

Marek Over a year ago

You could check if your XX is ok with any(grepl(XX,x)). If it FALSE then it's ok.

Josh O'Brien Over a year ago

@established1969 -- To trim spaces following the comma, I'd do strsplit(sub(",\\s*", XX, x), XX) instead.

flodel · Accepted Answer · 2012-04-25 04:40:30Z

10

From the stringr package:

str_split_fixed(x, pattern = ', ', n = 2)
#      [,1]                  
# [1,] "I want to split here"
#      [,2]                                                
# [1,] "though I don't want to split elsewhere, even here."

(That's a matrix with one row and two columns.)

answered Apr 25, 2012 at 4:40

flodel

89.3k21 gold badges190 silver badges225 bronze badges

Comments

Vincent Zoonekynd · Accepted Answer · 2012-04-25 13:52:02Z

4

Here is yet another solution, with a regular expression to capture what is before and after the first comma.

x <- "I want to split here, though I don't want to split elsewhere, even here."
library(stringr)
str_match(x, "^(.*?),\\s*(.*)")[,-1] 
# [1] "I want to split here"                              
# [2] "though I don't want to split elsewhere, even here."

answered Apr 25, 2012 at 13:52

Vincent Zoonekynd

32.5k5 gold badges74 silver badges80 bronze badges

Comments

John · Accepted Answer · 2012-04-25 04:13:32Z

3

library(stringr)

str_sub(x,end = min(str_locate(string=x, ',')-1))

This will get the first bit you want. Change the start= and end= in str_sub to get what ever else you want.

Such as:

str_sub(x,start = min(str_locate(string=x, ',')+1 ))

and wrap in str_trim to get rid of the leading space:

str_trim(str_sub(x,start = min(str_locate(string=x, ',')+1 )))

edited Apr 25, 2012 at 4:13

answered Apr 25, 2012 at 4:06

John

43.6k32 gold badges88 silver badges109 bronze badges

Comments

Tyler Rinker · Accepted Answer · 2012-10-07 01:29:06Z

2

This works but I like Josh Obrien's better:

y <- strsplit(x, ",")
sapply(y, function(x) data.frame(x= x[1], 
    z=paste(x[-1], collapse=",")), simplify=F))

Inspired by chase's response.

A number of people gave non base approaches so I figure I'd add the one I usually use (though in this case I needed a base response):

y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")
library(reshape2)
colsplit(y, ",", c("x","z"))

edited Oct 7, 2012 at 1:29

answered Apr 25, 2012 at 4:30

Tyler Rinker

111k74 gold badges335 silver badges534 bronze badges

1 Comment

Dason Over a year ago

In your first part I don't see why you would use sapply over the seq_along(y) instead of just y. You don't look like you ever actually need the index explicitly. It also looks like you're removing all the commas even though you wanted them to be kept in the other strings?

Collectives™ on Stack Overflow

Split on first comma in string

5 Answers 5

5 Comments

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

Comments

Comments

1 Comment

Linked

Related