0

I have two data frames, one containing the predictors and one containing the different categories I want to predict. Both of the data frames contain a column named geoid. Some of the rows of my predictors contains NA values, and I need to remove these. After extracting the geoid value of the rows containing NA values, and removing them from the predictors data frame I need to remove the corresponding rows from the categories data frame as well. It seems like a rather basic operation but the code won't work.

categories <- as.data.frame(read.csv("files/cat_df.csv"))
predictors <- as.data.frame(read.csv("files/radius_100.csv"))
NA_rows <- predictors[!complete.cases(predictors),]
geoids <- NA_rows['geoid']
clean_categories <- categories[!(categories$geoid %in% geoids),]

None of the rows in categories/clean_categories are removed.

A typical geoid value is US06140231. typeof(categories$geoid) returns integer.

1 Answer 1

1

I can't say this is it, but a very basic typo won't be doing what you want, try this correction

clean_categories <- categories[!(categories$geoid %in% geoids),]

Almost certainly this is what you meant to happen in that line. You want to negate the result of the %in% operator. You don't include a reproducible example so I can't say whether the whole thing will do as you want.

Sign up to request clarification or add additional context in comments.

2 Comments

That definitely makes sense. It still won't remove the rows though. Do you have any other idea how I can remove the rows without hardcoding them?
Why don't you join the data frames first and then do a single filter? Also, read.csv() returns a data.frame so no need to wrap with as.data.frame().

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.