2

I want to combine to dataframes, df1 with 15.000 obs and df2 consisting of 2.3 mill. I'm trying to match values, if df1$col1 == df2$c1, AND df1$col2 == df2$c2, then insert value from df2$dummy, to df1$col3. If no match in both, do nothing. All are 8 digits, except df2$dummy, which is a dummy of 0 or 1.

df1  col1       col2      col3
1    25382701   65352617  -
2    22363658   45363783  -
3    20019696   23274747  -
df2  c1         c2        dummy
1    17472802   65548585  1
2    20383829   24747473  0
3    20019696   23274747  0 
4    01382947   21930283  1
5    22123425   65382920  0

In the example the only match is row 3, and the value 0 from the dummy column should be inserted in col3 row3. I've tried to make a look-up table, a function of for and if, but not found a solution when requiring matches in two dataframes. (No need to say I guess, but I'm new to R and programming..)

1 Answer 1

2

We can use a join in data.table

library(data.table)
df1$col3 <- NULL
setDT(df1)[df2, col3 := i.dummy, on = .(col1 = c1, col2 = c2)]
df1
#       col1     col2 col3
#1: 25382701 65352617   NA
#2: 22363658 45363783   NA
#3: 20019696 23274747    0

data

df1 <- structure(list(col1 = c(25382701L, 22363658L, 20019696L), col2 = c(65352617L, 
45363783L, 23274747L), col3 = c("-", "-", "-")), class = "data.frame", row.names = c("1", 
"2", "3"))

df2 <- structure(list(c1 = c(17472802L, 20383829L, 20019696L, 1382947L, 
22123425L), c2 = c(65548585L, 24747473L, 23274747L, 21930283L, 
65382920L), dummy = c(1L, 0L, 0L, 1L, 0L)), class = "data.frame",
row.names = c("1", 
"2", "3", "4", "5"))
Sign up to request clarification or add additional context in comments.

2 Comments

Works with your code, but not when using my own dataframes. Have changed column names , still getting: "argument specifying columns specify non existing column(s): cols[1]='col1'" or "Incompatible join types: x.col1 (character) and i.c1 (double)". Last one I solved with structure(list(col1 = df1$col1 ....). Any ideas?
@Fjellbekken I think your column types are different. Can you change either df1$col1 <- as.numeric(df1$col1) or change the df2$c1 <- as.character(df2$c1)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.