How to filter rows from various string patterns in R

Question

I have a large dataframe, in which there is a column with number and letter codes. Something like this:

ID	death_cause
1	K703
2	N19X
3	C069
4	C07X
5	D181
6	R99X
7	D371
8	E117
9	D489
10	D500

I need to filter and keep all codes starting with the letter C and codes starting with the letter D, but only with the numbers from 0 to 48 (i.e. D00, D10, D20, D48), data starting with D49 onwards are no longer needed.

I have managed to filter out the letter C codes, since it is easy to just ask to keep the characters starting with the letter C with dplyr and stringr.

df_filtered <- df %>% 
  filter(str_detect(death_cause, "^C"))

However, I need to keep the specific D-codes as well. One idea I had is to create a vector with the characters of the D-codes

D_codes <- paste("D", 00:48, sep = "")

My question is how to filter those other character patterns next to the C codes with dplyr and stringr (tidyverse, in general) functions.

I tried:

 df_filtered <- df %>% 
      filter(str_detect(death_cause, "^C") | str_detect(death_cause, D_codes ) )

Any help you can give me, I would appreciate it.

I think you can probably get away with df %>% filter(grepl("^C|^D", death_cause), death_cause < "D49"). — iroha
– iroha, Commented Oct 7, 2023 at 1:53
alternatively df %>% filter(str_detect(death_cause,'^[C|D]') & between(as.numeric(str_remove_all(death_cause,'\\D')),0,48)) — jkatam
– jkatam, Commented Oct 7, 2023 at 3:51

zephryl · Accepted Answer · 2023-10-07 01:48:18Z

0

You’re on the right track. You’ll want to pad the single digit numerals for your D codes:

library(stringr)
library(dplyr)

D_codes <- str_c("D", str_pad(0:48, 2, pad = "0"))

And just use %in% rather than str_detect():

df %>% 
  filter(str_starts(death_cause, "C") | death_cause %in% D_codes))

(Also note str_starts() as an alternative to str_detect() in this case.)

answered Oct 7, 2023 at 1:48

zephryl

17.7k4 gold badges16 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to filter rows from various string patterns in R

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related