2

I have a table with the following column structure :

Name                                                Type
Urgent Care (Revenue Code: 0456)                    Per Case
IV Therapy (Revenue Codes 0260, 0269)               Per Visit
Oncology Treatment (Revenue Codes: 0280, 0289)      Per Visit

and I want to extract the numeric revenue codes from the names column so the table would look like :

Name                     Rev Code      Type
Urgent Care              0456          Per Case
IV Therapy               0260, 0269    Per Visit
Oncology Treatment       0280, 0289    Per Visit

the original data is inconsistent in the Name column as the word "Code" is followed by a ";" , blank space, "-" etc. So I was trying to use regular expression to search for the first number and then split the column there.

I have tried using regex to search for the first digit and separate() from the tidyr package :

library(tidyr)
separate(mydata, Name, into = c("Name", "Rev Code"), sep = "[[:digit:]]")

which splits the column at the right location but the column "Rev Code" ends up blank? I'm relatively new to R and would definitely appreciate any help!

Data:

structure(list(
Name = c("Urgent Care (Revenue Code: 0456)", "IV Therapy (Revenue Codes 0260, 0269)", 
"Oncology Treatment (Revenue Codes: 0280, 0289)"), 
Type = c("Per Case", "Per Visit", "Per Visit")), 
.Names = c("Name", "Type"), row.names = 1:3, class = "data.frame")
1
  • 1
    agreed. Just made that edit. Commented Oct 14, 2016 at 19:27

3 Answers 3

2
read.table(header=TRUE, stringsAsFactors=FALSE, sep=",", text='Name,Type
"Urgent Care (Revenue Code: 0456)", "Per Case"
"IV Therapy (Revenue Codes 0260, 0269)","Per Visit"
"Oncology Treatment (Revenue Codes: 0280, 0289)", "Per Visit"') -> df

library(stringi)
library(dplyr)
library(purrr)

extract_codes <- function(x) {
  stri_match_all_regex(x, "[[:digit:]]+") %>% # extract the numbers
    map(~paste0(as.vector(.), collapse=", ")) # paste them back together
}

mutate(df, `Rev Code`=extract_codes(Name))
Sign up to request clarification or add additional context in comments.

Comments

1

We can try with extract

library(tidyr)
extract(df1, Name, into = c("Name", "RevCode"), "([^(]+)\\s*[^0-9]+([0-9].*).")

#               Name    RevCode      Type
#1        Urgent Care       0456  Per Case
#2         IV Therapy 0260, 0269 Per Visit
#3 Oncology Treatment 0280, 0289 Per Visit

As the OP commented that other patterns exist,

extract(df2, Name, into = c("Name", "RevCode"), "([^(]+)\\s*[^0-9]+([0-9].*).")
#                 Name         RevCode      Type
#1        Urgent Care             0456  Per Case
#2         IV Therapy       0260, 0269 Per Visit
#3 Oncology Treatment       0280, 0289 Per Visit
#4     Speech Therapy  0440-0444, 0449 Per Visit

data

df2 <- structure(list(Name = c("Urgent Care (Revenue Code: 0456)", 
 "IV Therapy (Revenue Codes 0260, 0269)", 
"Oncology Treatment (Revenue Codes: 0280, 0289)", 
"Speech Therapy (Revenue Codes: 0440-0444, 0449)"
), Type = c("Per Case", "Per Visit", "Per Visit", "Per Visit"
)), .Names = c("Name", "Type"), class = "data.frame", row.names = c(NA, 
-4L))

7 Comments

That definitely helped. There are couple rows in the data which leave the closing ")" in the RevCode column. Couldn't figure out why. Couple examples are :
Speech Therapy (Revenue Codes: 0440-0444, 0449)
@user6340762 The pattern is different there, as it also has -
you're right. It is, my data had some white-space which I overlooked. Thanks!
@user6340762 In the extract, we capture as a group. So, whichever is inside the capture group ((...)) is the one got extracted. i.e. in the first case (([^(]+)) it is one or more characters that are not a ( and in the second it is (([0-9].*)) it is number followed by other characters.
|
1

Without extra package:

> data.frame(Name=gsub("\\(.*\\)", "", df$Name),
            RevCode=regmatches(df$Name, regexpr("[[:digit:]]+(\\,[[:space:]][[:digit:]]+)?", df$Name)),
            Type=df$Type)
                 Name    RevCode      Type
1        Urgent Care        0456  Per Case
2         IV Therapy  0260, 0269 Per Visit
3 Oncology Treatment  0280, 0289 Per Visit

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.