0

I have a 200k rows dataframe with a character column named "departament_name", some of the values in this column contain a specific char: "?". For example: "GENERAL SAN MART?N", " UNI?N", etc. I want to replace those values using another 750k rows dataframe that cointains a column also named "departament_name", but the values in this column are correct. Following the example, it will be: "GENERAL SAN MARTIN", "UNION", and so on.

Can I do this automatically using pattern recognition withouth making a dictionary (there are several values with this problem). My objetive is to have an unified dataset with the two dataframes and unique values for those problematics rows in "departament_name". I prefer tidyverse (mutate, stringr, etc) if possible.

1
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Commented Sep 4, 2021 at 16:09

1 Answer 1

0

You can try using stringdist.* joins from fuzzjoin package.

fuzzyjoin::stringdist_left_join(df1, df2, by = 'departament_name')

#  departament_name.x departament_name.y
#1 GENERAL SAN MART?N GENERAL SAN MARTIN
#2              UNI?N              UNION

Obviously, this works for the simple example you have shared but it might not give you 100% correct result for all the entries in your actual data. You can tweak the parameters max_dist and method as per your data. See ?fuzzyjoin::stringdist_join for more information about them.

data

df1 <- data.frame(departament_name = c("GENERAL SAN MART?N", "UNI?N"))
df2 <- data.frame(departament_name = c("GENERAL SAN MARTIN", "UNION"))
Sign up to request clarification or add additional context in comments.

1 Comment

I think a need a full join to unify the dataframes. Another problem is that the dataframe give information about different time series (one from 2018~2019 the other from 2020~2021) so i must keep all data from the two of them (put one "under" another). Leftjoins are problematic in this cases.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.