Replacing values by values of a different dataframe in R

Question

I have this kind of dataframe (df1):

     Code    ID_CODE     value_1   value_2    value_3
     HA09U    98_ 10      57.80      NA          NA
     HA09U    98_ 11      57.80      NA          NA
     HA09U    98_ 12      57.80      NA          NA 
     MB03L    99_ 10        NA      90.77        NA
     MB03L    99_ 11        NA      90.77        NA
     MB03L    99_ 12        NA      90.77        NA
     KE17P    100_ 10      68.44    76.05        72.01
     KE17P    100_ 11      68.44    76.05        72.01
     KE17P    100_ 12      68.44    76.05        72.01

I want to replace the value of ALL variables (value_1, value_2 and value_3) of the person that includes NAs by adding the values from my second dataframe (df2):

     Code    value_1   value_2    value_3
    HA09U     90.35     89.84      90.07
    MB03L     66.94     79.62      73.77

The result should be:

      Code    ID_CODE     value_1    value_2    value_3
     HA09U    98_ 10       90.35      89.84      90.07
     HA09U    98_ 11       90.35      89.84      90.07
     HA09U    98_ 12       90.35      89.84      90.07 
     MB03L    99_ 10       66.94      79.62      73.77
     MB03L    99_ 11       66.94      79.62      73.77
     MB03L    99_ 12       66.94      79.62      73.77
     KE17P    100_ 10      68.44      76.05      72.01
     KE17P    100_ 11      68.44      76.05      72.01
     KE17P    100_ 12      68.44      76.05      72.01

I tried to use some merge and bind_rows function, this one took me the closest:

df1$value_1[df1$Code == df2$Code] <- df2$value_1

But it did not worked and somehow messed up the dataframe.

Using this code

d1 <- left_join(df1, df2, by = "Code" )

Adds three new colums with the values of df2, but the values of KE17P (the person that already has complete data) is missing. So I am really looking for a replacing function.

I tried df3<- left_join(df1, df2, by = "Code" ) and it worked so far... but adds the values from df2 by adding three new columns and not replacing the old values in the original columns — psycho95
– psycho95, Commented Mar 23, 2021 at 15:35
So drop the original columns from df1 before you do the left_join... — Limey
– Limey, Commented Mar 23, 2021 at 15:40
Yeah but the original datafram is much more complex and has multiple IDs and rows that doesn't exist in df2. So if I drop them they would be lost, because in the new columns they all have NAs — psycho95
– psycho95, Commented Mar 23, 2021 at 15:42
Then rewrite your question so that it includes all salient features of the problem. — Limey
– Limey, Commented Mar 23, 2021 at 15:44

akrun · Accepted Answer · 2021-03-23 15:58:05Z

We can use a join in data.table (Here, we assume the NA columns are numeric class and not logical)

library(data.table)
nm1 <- grep('^value_\\d+$', names(df1), value = TRUE)
setDT(df1)[df2, (nm1) := mget(paste0('i.', nm1)), on = .(Code)]

-output

df1
#    Code ID_CODE value_1 value_2 value_3
#1: MB03L  99_ 10   66.94   79.62   73.77
#2: MB03L  99_ 11   66.94   79.62   73.77
#3: MB03L  99_ 12   66.94   79.62   73.77
#4: MB03L  99_ 13   66.94   79.62   73.77
#5: MB03L  99_ 14   66.94   79.62   73.77
#6: MB03L  99_ 15   66.94   79.62   73.77

Or using tidyverse

library(dplyr)
library(stringr)
left_join(df1, df2, by = 'Code') %>%
  transmute(Code, ID_CODE, across(ends_with('.x'), ~ 
     coalesce(get(str_replace(cur_column(),"\\.x", ".y")), .))) %>% 
  rename_with(~ str_remove(., '\\.x'), starts_with('value_'))

-output

#    Code ID_CODE value_1 value_2 value_3
#1 MB03L  99_ 10   66.94   79.62   73.77
#2 MB03L  99_ 11   66.94   79.62   73.77
#3 MB03L  99_ 12   66.94   79.62   73.77
#4 MB03L  99_ 13   66.94   79.62   73.77
#5 MB03L  99_ 14   66.94   79.62   73.77
#6 MB03L  99_ 15   66.94   79.62   73.77

data

df1 <- structure(list(Code = c("MB03L", "MB03L", "MB03L", "MB03L", "MB03L", 
"MB03L"), ID_CODE = c("99_ 10", "99_ 11", "99_ 12", "99_ 13", 
"99_ 14", "99_ 15"), value_1 = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), value_2 = c(90.77, 90.77, 90.77, 
90.77, 90.77, 90.77), value_3 = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_)), row.names = c(NA, -6L), class = "data.frame")


df2 <- structure(list(Code = c("HA09U", "MB03L"), value_1 = c(90.35, 
66.94), value_2 = c(89.84, 79.62), value_3 = c(90.07, 73.77)),
class = "data.frame", row.names = c(NA, 
-2L))

@psycho95 as the column names are the same in 'df1' and df2', the df2 columns are identified with prefix i.values_1, i.values_2, etc.
@psycho95 have you loaded the library(data.table) . Please use the data in my post and try it again. thanks
I worked with your data, but somehow doesn't with my dataframe
I restarted R and tried it again, now it worked, don't know why it didn*t before... Thank you very much!

Collectives™ on Stack Overflow

Replacing values by values of a different dataframe in R

1 Answer 1

data

6 Comments

Hot Network Questions