sorry an absolute beginner so have some very basic questions!
I have a very large data set that lists individual transactions by a household. Example is below.
# hh_id trans_type transaction_value
# 1 hh1 food 4
# 2 hh1 water 5
# 3 hh1 transport 4
# 4 hh2 water 3
# 5 hh3 transport 1
# 6 hh3 food 10
# 7 hh4 food 5
# 8 hh4 transport 15
# 9 hh4 water 10
I want to to create a new data frame that has all transactions listed for ONLY the households that have transactions in the "water" category. (Eg, I would want a df without hh3 above because they have not had any expenses in "water")
as a first step, I have a data frame with one column (hh_ids) that only has the household IDs of the ones that I want. How do I then subset my larger dataframe to remove all rows of transactions that are not from a household that have expenses in the "water" category?
Data
## data from @gung
d <- read.table(text="hh_id trans_type transaction_value
hh1 food 4
hh1 water 5
hh1 transport 4
hh2 water 3
hh3 transport 1
hh3 food 10
hh4 food 5
hh4 transport 15
hh4 water 10", header=T)
df2 <- df1[df1$trans_type != "water",]library(data.table);setDT(df1)[,if(any(trans_type=="water")) .SD , by = hh_id]orlibrary(dplyr); df1 %>% group_by(hh_id) %>% filter(any(trans_type=="water"))I am sure there would be dupes, but unfortunately couldn't find it now. (It took almost 10 mins to search google and still couldn't find it. Isn't it more efficient to type an answer instead of searching the dupes?)d[with(d, ave(trans_type %in% 'water', hh_id, FUN = sum)) > 0, ]