1

My string pattern is as follows:

1233 fox street, omaha NE ,69131-7233
Jeffrey Jones, 666 Church Street, Omaha NE ,69131-72339
Betty Davis, LLC, 334 Aloha Blvd., Fort Collins CO ,84444-00333
,1233 Decker street, omaha NE ,69131-7233

I need to separate the above string into four variables: name, address, city_state, zipcode.

Since the pattern has three to four commas, I am starting at the right to separate the field into multiple fields.

rubular.com says the pattern ("(,\\d.........)$"))) or the pattern ",\d.........$" will match the zipcode at the end of the string.

regex101.com, finds neither of the above patterns comes up with a match.

When I try to separate with:

#need to load pkg:tidyr for the `separate`

function library(tidyr) separate(street_add, c("street_add2", "zip", sep= ("(,\d.........)$")))

or with:

separate(street_add, c("street_add2", "zip", sep=  (",\d.........$"))) 

In both scenarios, R splits at the first comma in the string.
How do I split the string into segments?
Thank you.

1
  • Re: citing rubular.com ...You should realize that the syntax for regular expressions in R is different than in some other languages. Maybe Ruby has the same weirdness as R (I don't know if this is the case), but if so you should say so. Otherwise you should rely on R-specific authorities for example code for R-regex patterns. Commented Aug 11, 2021 at 4:52

1 Answer 1

2

Use

sep=",(?=[^,]*$)"

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  ,                        ','
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^,]*                    any character except: ',' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead
Sign up to request clarification or add additional context in comments.

3 Comments

Ryszard: Thank you for the code and the explanation. :)
Thank you Ryszard. I'm still learning my way around here.
That website is only useful if the person using it understands that R needs modifications for escape sequences.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.