11

How can I efficiently split the following string on the first comma using base?

x <- "I want to split here, though I don't want to split elsewhere, even here."
strsplit(x, ???)

Desired outcome (2 strings):

[[1]]
[1] "I want to split here"   "though I don't want to split elsewhere, even here."

Thank you in advance.

EDIT: Didn't think to mention this. This needs to be able to generalize to a column, vector of strings like this, as in:

y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")

The outcome can be two columns or one long vector (that I can take every other element of) or a list of stings with each index ([[n]]) having two strings.

Apologies for the lack of clarity.

3
  • extremely hacky, but what about something like list(head(y[[1]],1), paste(tail(y[[1]],-1), collapse = ",")) where y is the output of strsplit(x, ...)? Commented Apr 25, 2012 at 4:08
  • Chase I tried it but couldn't seem to get it to work for a vector of similar strings. I edited my original post to further explain the problem. Commented Apr 25, 2012 at 4:17
  • the str_locate_all(string=y, ',') will find all index locations of your pattern (comma in your case) which can then be applied to select out of vector or column. Commented Apr 25, 2012 at 4:23

5 Answers 5

13

Here's what I'd probably do. It may seem hacky, but since sub() and strsplit() are both vectorized, it will also work smoothly when handed multiple strings.

XX <- "SoMeThInGrIdIcUlOuS"
strsplit(sub(",\\s*", XX, x), XX)
# [[1]]
# [1] "I want to split here"                               
# [2] "though I don't want to split elsewhere, even here."
Sign up to request clarification or add additional context in comments.

5 Comments

@josh-obrien How would you extend that code to trim the leading space in [2].
I'll wrap it with gsub("^\\s+|\\s+$", "", JOSH's STUFF)
I like it Josh. It works and is pretty simple and stays in base. thank you. +1
You could check if your XX is ok with any(grepl(XX,x)). If it FALSE then it's ok.
@established1969 -- To trim spaces following the comma, I'd do strsplit(sub(",\\s*", XX, x), XX) instead.
10

From the stringr package:

str_split_fixed(x, pattern = ', ', n = 2)
#      [,1]                  
# [1,] "I want to split here"
#      [,2]                                                
# [1,] "though I don't want to split elsewhere, even here."

(That's a matrix with one row and two columns.)

Comments

4

Here is yet another solution, with a regular expression to capture what is before and after the first comma.

x <- "I want to split here, though I don't want to split elsewhere, even here."
library(stringr)
str_match(x, "^(.*?),\\s*(.*)")[,-1] 
# [1] "I want to split here"                              
# [2] "though I don't want to split elsewhere, even here."

Comments

3

library(stringr)

str_sub(x,end = min(str_locate(string=x, ',')-1))

This will get the first bit you want. Change the start= and end= in str_sub to get what ever else you want.

Such as:

str_sub(x,start = min(str_locate(string=x, ',')+1 ))

and wrap in str_trim to get rid of the leading space:

str_trim(str_sub(x,start = min(str_locate(string=x, ',')+1 )))

Comments

2

This works but I like Josh Obrien's better:

y <- strsplit(x, ",")
sapply(y, function(x) data.frame(x= x[1], 
    z=paste(x[-1], collapse=",")), simplify=F))

Inspired by chase's response.

A number of people gave non base approaches so I figure I'd add the one I usually use (though in this case I needed a base response):

y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")
library(reshape2)
colsplit(y, ",", c("x","z"))

1 Comment

In your first part I don't see why you would use sapply over the seq_along(y) instead of just y. You don't look like you ever actually need the index explicitly. It also looks like you're removing all the commas even though you wanted them to be kept in the other strings?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.