I'm trying to filter a character vector created from pdf_ocr_text using multiple regex expressions. Specifically, I want to select elements that either (1) start with a digit or (2) with two spaces and a digit. I also want to keep the space in the string. Here's a reproducible example.
df <- c(" 065074 10/1/91 10/1/96 8 10 5 ",
"060227 10/1/93 10/1/93 9 5 5 ",
" 060178 10/1/95 10/1/98 8 10 5 ", "060294 10/1/91 10/1/98 8 10 5 ",
"060212 10/1/91 10/1/93 8 10 5 ", " 060228 10/1/92 10/1/92 9 5 5 ",
" 060257 10/1/92 10/1/92 9 5 5 ",
"060348 10/1/91 10/1/93 8 10 5 ", " 080379 10/1/91 10/1/96 6 20 5 ",
" 060239 10/1/91 10/1/98 8 10 5 ", " 060012 10/1/92 10/1/92 9 5 5 ",
" 060360 10/1/96 10/1/96 9 5 5 ", " 060035 10/1/95 10/1/95 9 5 5 ",
" 060243 10/1/92 10/1/93 8 10 5 ", " 060262 10/1/92 ; 10/1/94 7 15 5 ",
" = = ", " 40097 2 4 40097 _"
)
I've tried the following but it doesn't seem to work. However, if I use only one of the two conditions, it works.
df[df %>% str_detect(., "^\\s{2}\\d | ^\\d")]. # This fails
df[df %>% str_detect(., "^\\d")]. # With only one condition, it works
[1] "060227 10/1/93 10/1/93 9 5 5 " "060294 10/1/91 10/1/98 8 10 5 "
[3] "060212 10/1/91 10/1/93 8 10 5 " "060348 10/1/91 10/1/93 8 10 5 "
How can I use two regex expressions as a pattern?