1

I have this kind of dataframe (df1):

     Code    ID_CODE     value_1   value_2    value_3
     HA09U    98_ 10      57.80      NA          NA
     HA09U    98_ 11      57.80      NA          NA
     HA09U    98_ 12      57.80      NA          NA 
     MB03L    99_ 10        NA      90.77        NA
     MB03L    99_ 11        NA      90.77        NA
     MB03L    99_ 12        NA      90.77        NA
     KE17P    100_ 10      68.44    76.05        72.01
     KE17P    100_ 11      68.44    76.05        72.01
     KE17P    100_ 12      68.44    76.05        72.01

I want to replace the value of ALL variables (value_1, value_2 and value_3) of the person that includes NAs by adding the values from my second dataframe (df2):

     Code    value_1   value_2    value_3
    HA09U     90.35     89.84      90.07
    MB03L     66.94     79.62      73.77

The result should be:

      Code    ID_CODE     value_1    value_2    value_3
     HA09U    98_ 10       90.35      89.84      90.07
     HA09U    98_ 11       90.35      89.84      90.07
     HA09U    98_ 12       90.35      89.84      90.07 
     MB03L    99_ 10       66.94      79.62      73.77
     MB03L    99_ 11       66.94      79.62      73.77
     MB03L    99_ 12       66.94      79.62      73.77
     KE17P    100_ 10      68.44      76.05      72.01
     KE17P    100_ 11      68.44      76.05      72.01
     KE17P    100_ 12      68.44      76.05      72.01

I tried to use some merge and bind_rows function, this one took me the closest:

df1$value_1[df1$Code == df2$Code] <- df2$value_1

But it did not worked and somehow messed up the dataframe.

Using this code

d1 <- left_join(df1, df2, by = "Code" )

Adds three new colums with the values of df2, but the values of KE17P (the person that already has complete data) is missing. So I am really looking for a replacing function.

6
  • left_join() from dplyr will give you what you want. Commented Mar 23, 2021 at 15:27
  • I tried df3<- left_join(df1, df2, by = "Code" ) and it worked so far... but adds the values from df2 by adding three new columns and not replacing the old values in the original columns Commented Mar 23, 2021 at 15:35
  • So drop the original columns from df1 before you do the left_join... Commented Mar 23, 2021 at 15:40
  • Yeah but the original datafram is much more complex and has multiple IDs and rows that doesn't exist in df2. So if I drop them they would be lost, because in the new columns they all have NAs Commented Mar 23, 2021 at 15:42
  • Then rewrite your question so that it includes all salient features of the problem. Commented Mar 23, 2021 at 15:44

1 Answer 1

1

We can use a join in data.table (Here, we assume the NA columns are numeric class and not logical)

library(data.table)
nm1 <- grep('^value_\\d+$', names(df1), value = TRUE)
setDT(df1)[df2, (nm1) := mget(paste0('i.', nm1)), on = .(Code)]

-output

df1
#    Code ID_CODE value_1 value_2 value_3
#1: MB03L  99_ 10   66.94   79.62   73.77
#2: MB03L  99_ 11   66.94   79.62   73.77
#3: MB03L  99_ 12   66.94   79.62   73.77
#4: MB03L  99_ 13   66.94   79.62   73.77
#5: MB03L  99_ 14   66.94   79.62   73.77
#6: MB03L  99_ 15   66.94   79.62   73.77

Or using tidyverse

library(dplyr)
library(stringr)
left_join(df1, df2, by = 'Code') %>%
  transmute(Code, ID_CODE, across(ends_with('.x'), ~ 
     coalesce(get(str_replace(cur_column(),"\\.x", ".y")), .))) %>% 
  rename_with(~ str_remove(., '\\.x'), starts_with('value_'))

-output

#    Code ID_CODE value_1 value_2 value_3
#1 MB03L  99_ 10   66.94   79.62   73.77
#2 MB03L  99_ 11   66.94   79.62   73.77
#3 MB03L  99_ 12   66.94   79.62   73.77
#4 MB03L  99_ 13   66.94   79.62   73.77
#5 MB03L  99_ 14   66.94   79.62   73.77
#6 MB03L  99_ 15   66.94   79.62   73.77

data

df1 <- structure(list(Code = c("MB03L", "MB03L", "MB03L", "MB03L", "MB03L", 
"MB03L"), ID_CODE = c("99_ 10", "99_ 11", "99_ 12", "99_ 13", 
"99_ 14", "99_ 15"), value_1 = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), value_2 = c(90.77, 90.77, 90.77, 
90.77, 90.77, 90.77), value_3 = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_)), row.names = c(NA, -6L), class = "data.frame")


df2 <- structure(list(Code = c("HA09U", "MB03L"), value_1 = c(90.35, 
66.94), value_2 = c(89.84, 79.62), value_3 = c(90.07, 73.77)),
class = "data.frame", row.names = c(NA, 
-2L))
Sign up to request clarification or add additional context in comments.

6 Comments

@psycho95 as the column names are the same in 'df1' and df2', the df2 columns are identified with prefix i.values_1, i.values_2, etc.
I receive value for ‘i.’ not found
@psycho95 have you loaded the library(data.table) . Please use the data in my post and try it again. thanks
I worked with your data, but somehow doesn't with my dataframe
I restarted R and tried it again, now it worked, don't know why it didn*t before... Thank you very much!
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.