My data contains text messages which look like the below. I want to extract the block age from them.
x:
my block is 8 years old and I am happy with it. I had been travelling since 2 years and that’s fun too…..
He invested in my 1 year block and is happy with the returns
He re-invested in my 1.5 year old block
i had come to U.K for 4 years and when I reach Germany my block will be of 5 years
I extracted the number followed by the word "year" or "years", But I realised I should be picking the number closer to the word "block".
library(stringr)
> str_extract_all(x, "[0-9.]{1,3}.year|[0-9.]{1,3}.years")
[[1]]
[1] "8 years" "2 years"
[[2]]
[1] "1 year"
[[3]]
[1] "1.5 year"
[[4]]
[1] "4 years" "5 years"
I want the output to be a list containing
8 years
1 year
1.5 year
5 years
I was thinking of extracting part of the sentence which contain the words "block", "old". But I am not quite clear on how to implement this. Any ideas or suggestions to better this process would be helpful.
THANKS