0

I have a large dataframe with a column of strings that have either 1 or 2 numbers at the end of the string. I'd like to extract these 1 or 2 numbers and place them in a new column. Here is how I am doing it using stringr package and dplyr:

library(tidyverse)

df <- structure(list(row = 1:9, Sensor = c("Inclin_01_A1", "Inclin_01_A2", "Inclin_01_A10", "Inclin_01_A25", "Inclin_01_B1", "Inclin_01_B2", "Inclin_01_B36", "Temp_F1", "Temp_F14")), row.names = c(NA, -9L), class = "data.frame")

df1 <- df %>% 
    mutate(newcol = if_else(str_sub(Sensor, -2, -2) == "A" | str_sub(Sensor, -2, -2) == "B" | str_sub(Sensor, -2, -2) =="F",
           str_sub(Sensor, -1,-1),
           str_sub(Sensor, -2)))

It seems clunky and takes a long time for my dataframe. Is there a less clunky and faster way to do this? Note that the character before the numbers will always be A, B, or F.

2
  • Oops, as posted this is a duplicate. Head over there to see the most concise answer yet. Commented Jul 2 at 23:08
  • You can certainly do shorter versions than posted which will work for this example, but I do think in the instance where you have a desired pattern like ABF+1 or 2 digits+end you should look for that, and that only. What if the data inadvertently had 'missing' values like "99" that you hadn't noticed? One option will give an NA, one will return a potentially nonsense value that might blow up your analysis quietly. Commented Jul 3 at 0:25

2 Answers 2

4

I'd use a regex with ?str_extract to enforce the A, B or F coming before, and then look for 1 or 2 digits afterwards at the $ end of the string.

df %>% mutate(newcol = str_extract(Sensor, "(?<=[ABF])\\d{1,2}$"))
#  row        Sensor newcol
#1   1  Inclin_01_A1      1
#2   2  Inclin_01_A2      2
#3   3 Inclin_01_A10     10
#4   4 Inclin_01_A25     25
#5   5  Inclin_01_B1      1
#6   6  Inclin_01_B2      2
#7   7 Inclin_01_B36     36
#8   8       Temp_F1      1
#9   9      Temp_F14     14
Sign up to request clarification or add additional context in comments.

Comments

3

You can try sub like below

> transform(df, newcol = sub(".*?(\\d+)$", "\\1", Sensor))
  row        Sensor newcol
1   1  Inclin_01_A1      1
2   2  Inclin_01_A2      2
3   3 Inclin_01_A10     10
4   4 Inclin_01_A25     25
5   5  Inclin_01_B1      1
6   6  Inclin_01_B2      2
7   7 Inclin_01_B36     36
8   8       Temp_F1      1
9   9      Temp_F14     14

4 Comments

I tend towards simpler like sub, the only risk is that if it finds nothing, it returns the whole string, requiring more steps.
@r2evans gsub('(\\d+)$|.', '\\1', c('Inclin_01_A10', 'Temp_F1', 'Nothing'))
@r2evans thanks for the feedback. I think rawr's comment resolves your question.
Yup, I've seen that trick before, forgotten it, and keep coming back to it. This time I'm writing it down :-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.