2

I have two data frames:

df1
Syllable Duration Pitch
@         0.08    93
@         0.05    107
@         0.13    56
@         0.07    95
@         0.07    123

df2
Syllable Duration 
@        0.08 
@        0.05 
@        0.07
@        0.07 

I want to merge them into another data frame:

df3
Syllable Duration Pitch
@        0.08     93
@        0.05     107
@        0.07     95
@        0.07     123

The problem is that I have repeated Syllable and Duration values. I've tried this code, but it gives me incorrect Pitch:

df3 <- merge(df2, df1[!duplicated(df1$Syllable),], by="Syllable")

df3
Syllable Duration Pitch
@        0.08     93
@        0.05     93
@        0.07     93
@        0.07     93
6
  • This is really somewhat difficult. How do you know which value pair is correct? Do you just want to keep the entries which are in df2 Syllable & Duration columns and filter out all values from df1 which are not in df2? Commented Jun 5, 2018 at 11:33
  • @rashid Yes, exactly. I want to filter out the rows that are not in df2. And the needed rows are going from top to bottom the way they are, so I guess it is somehow possible to do that. Commented Jun 5, 2018 at 11:38
  • Are you looking for an anti-join, i.e. exclude from df1 those rows for which the combination of Syllable and Duration is not in df2 ? Commented Jun 5, 2018 at 11:40
  • @jogo Yes, and I've tried a default anti-join as well. But it gave me repeated values, too. Commented Jun 5, 2018 at 11:42
  • imho it is a inner join of df1 and unique(df2) stackoverflow.com/questions/1299871/… Commented Jun 5, 2018 at 11:56

3 Answers 3

4

With data.table you can do:

library("data.table")
df1 <- fread(
"Syllable Duration Pitch
@ 0.08 93
@ 0.05 107
@ 0.13 56
@ 0.07 95
@ 0.07 123")
df2 <- fread(
"Syllable Duration 
@ 0.08 
@ 0.05 
@ 0.07
@ 0.07")
merge(df1, unique(df2))
# > merge(df1, unique(df2))
#    Syllable Duration Pitch
# 1:        @     0.05   107
# 2:        @     0.07    95
# 3:        @     0.07   123
# 4:        @     0.08    93

or without sorting:

merge(df1, unique(df2), sort=FALSE)
# > merge(df1, unique(df2), sort=FALSE)
#    Syllable Duration Pitch
# 1:        @     0.08    93
# 2:        @     0.05   107
# 3:        @     0.07    95
# 4:        @     0.07   123

this last is the same as:

df1[unique(df2), on=c("Syllable", "Duration")]
# > df1[unique(df2), on=c("Syllable", "Duration")]
#    Syllable Duration Pitch
# 1:        @     0.08    93
# 2:        @     0.05   107
# 3:        @     0.07    95
# 4:        @     0.07   123

With base R:

df1 <- read.table(header=TRUE, text=
"Syllable Duration Pitch
@         0.08    93
@         0.05    107
@         0.13    56
@         0.07    95
@         0.07    123")

df2 <- read.table(header=TRUE, text=
"Syllable Duration 
@        0.08 
@        0.05 
@        0.07
@        0.07 ")
merge(df1, unique(df2))
merge(df1, unique(df2), sort=FALSE)
Sign up to request clarification or add additional context in comments.

Comments

1

I would propose using the dplyr package. If you use it, you can select the columns which you want to join by. When you join, you should use an semi_join instead of an inner_join. The difference is that inner_join keeps all combinations and possibly duplicates rows ("If there are multiple matches between x and y, all combination of the matches are returned.")

semi_joinon the other hand does: "A semi join differs from an inner join because an inner join will return one row of x for each matching row of y, where a semi join will never duplicate rows of x."

For your case, you can use semi_join(df1, df2, by = c("Syllable", "Duration"))to merge dataframes. The byvector defines the column names you want to join by.

This gives you what you wanted:

  Syllable Duration Pitch
1        @     0.08    93 
2        @     0.05   107
3        @     0.07    95
4        @     0.07   123

Comments

1
#now keeps unique values for Syllable and the Pitch Values

df1 <- df1[order(df1$Syllable),]

df4<-merge(df2,df1)

df5<-df4[!duplicated(df4$Syllable),]

4 Comments

I can not get the desired result with your code. Did you test it?
Yes, I tested using the dataframe defined in this question. For the Pitch values I used (1,0,1,0,1). And it worked.
Can you put the used data in your answer (the question gave only a print of the dataframes) and can you please use the same Pitch values as in the question?
I just did with your same dataframes. I get for df5, 3 rows,with the values are join the 2 dataframes (syllable :0.5,0.7,and 0.8) with their corresponding pithc values. For me it's working. if you can give me further details we may be able to correct the code.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.