Extract last numbers from text strings [duplicate]

Question

I have a large dataframe with a column of strings that have either 1 or 2 numbers at the end of the string. I'd like to extract these 1 or 2 numbers and place them in a new column. Here is how I am doing it using stringr package and dplyr:

library(tidyverse)

df <- structure(list(row = 1:9, Sensor = c("Inclin_01_A1", "Inclin_01_A2", "Inclin_01_A10", "Inclin_01_A25", "Inclin_01_B1", "Inclin_01_B2", "Inclin_01_B36", "Temp_F1", "Temp_F14")), row.names = c(NA, -9L), class = "data.frame")

df1 <- df %>% 
    mutate(newcol = if_else(str_sub(Sensor, -2, -2) == "A" | str_sub(Sensor, -2, -2) == "B" | str_sub(Sensor, -2, -2) =="F",
           str_sub(Sensor, -1,-1),
           str_sub(Sensor, -2)))

It seems clunky and takes a long time for my dataframe. Is there a less clunky and faster way to do this? Note that the character before the numbers will always be A, B, or F.

Oops, as posted this is a duplicate. Head over there to see the most concise answer yet. — Eric Krantz
– Eric Krantz, Commented Jul 2 at 23:08
You can certainly do shorter versions than posted which will work for this example, but I do think in the instance where you have a desired pattern like ABF+1 or 2 digits+end you should look for that, and that only. What if the data inadvertently had 'missing' values like "99" that you hadn't noticed? One option will give an NA, one will return a potentially nonsense value that might blow up your analysis quietly. — thelatemail
– thelatemail, Commented Jul 3 at 0:25

thelatemail · Accepted Answer · 2025-07-02 21:59:58Z

4

I'd use a regex with ?str_extract to enforce the A, B or F coming before, and then look for 1 or 2 digits afterwards at the $ end of the string.

df %>% mutate(newcol = str_extract(Sensor, "(?<=[ABF])\\d{1,2}$"))
#  row        Sensor newcol
#1   1  Inclin_01_A1      1
#2   2  Inclin_01_A2      2
#3   3 Inclin_01_A10     10
#4   4 Inclin_01_A25     25
#5   5  Inclin_01_B1      1
#6   6  Inclin_01_B2      2
#7   7 Inclin_01_B36     36
#8   8       Temp_F1      1
#9   9      Temp_F14     14

answered Jul 2 at 21:59

thelatemail

94.3k12 gold badges139 silver badges197 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ThomasIsCoding · Accepted Answer · 2025-07-02 21:55:43Z

3

You can try sub like below

> transform(df, newcol = sub(".*?(\\d+)$", "\\1", Sensor))
  row        Sensor newcol
1   1  Inclin_01_A1      1
2   2  Inclin_01_A2      2
3   3 Inclin_01_A10     10
4   4 Inclin_01_A25     25
5   5  Inclin_01_B1      1
6   6  Inclin_01_B2      2
7   7 Inclin_01_B36     36
8   8       Temp_F1      1
9   9      Temp_F14     14

answered Jul 2 at 21:55

ThomasIsCoding

106k9 gold badges38 silver badges109 bronze badges

4 Comments

r2evans Jul 2 at 23:34

I tend towards simpler like sub, the only risk is that if it finds nothing, it returns the whole string, requiring more steps.

rawr Jul 3 at 7:41

@r2evans gsub('(\\d+)$|.', '\\1', c('Inclin_01_A10', 'Temp_F1', 'Nothing'))

ThomasIsCoding Jul 3 at 9:36

@r2evans thanks for the feedback. I think rawr's comment resolves your question.

r2evans Jul 3 at 10:24

Yup, I've seen that trick before, forgotten it, and keep coming back to it. This time I'm writing it down :-)

Collectives™ on Stack Overflow

Extract last numbers from text strings [duplicate]

2 Answers 2

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Linked

Related