1

I am trying to merge two data frames which has different number of rows. I just want to show the head and tail of both data frame because both have more than hundred rows. both data frames belong to miRNA expression analysis and both are shown below. both data frames differs in only 2 rows.

data frame 1 (miRNAs_256c)

   miRNA             read_count  precursor  total   seq X256c_norm
1      dre-miR-21      13309   dre-mir-21-2 13309 13309  550709,65
2 dre-miR-181a-5p       1004 dre-mir-181a-1  1004  1004   41544,25
3 dre-miR-181a-5p        927 dre-mir-181a-2   927   927   38358,09
4    dre-miR-181c        592   dre-mir-181c   592   592   24496,21
5    dre-miR-181b        579 dre-mir-181b-2   579   579   23958,29
6    dre-miR-181b        561 dre-mir-181b-1   561   561   23213,47          
160 dre-miR-7149-5p        0   dre-mir-7149     0   0          0
161  dre-miR-723-5p        0    dre-mir-723     0   0          0
162  dre-miR-727-5p        0    dre-mir-727     0   0          0
163     dre-miR-730        0    dre-mir-730     0   0          0
164     dre-miR-735        0    dre-mir-735     0   0          0
165     dre-miR-740        0    dre-mir-740     0   0          0

data frame 2 (miRNAs_shield)

  miRNA        read_count precursor   total seq shield_norm
1 dre-let-7a        424  dre-let-7a-1   424 424    72939,96
2 dre-let-7a        397  dre-let-7a-6   397 397     68295,2
3 dre-let-7a        371  dre-let-7a-5   371 371    63822,47
4 dre-let-7a        367  dre-let-7a-3   367 367    63134,35
5 dre-miR-21        345  dre-mir-21-2   345 345    59349,73
6 dre-let-7a        343  dre-let-7a-2   343 343    59005,68
162 dre-miR-723-5p    0  dre-mir-723     0   0           0
163 dre-miR-727-5p    0  dre-mir-727     0   0           0
164    dre-miR-730    0  dre-mir-730     0   0           0
165    dre-miR-731    0  dre-mir-731     0   0           0
166    dre-miR-735    0  dre-mir-735     0   0           0
167    dre-miR-740    0  dre-mir-740     0   0           0

I need to merge both in only one keeping the headers of the columns and also I want to keep those with 0 read count.

I tried several ways but nothing is working as I am expecting. first I tried cbind but got an error because the difference in row number.

prueba <- cbind (miRNAs_256c, miRNAs_shield)

Second, I tried the code proposed by markheckmann in http://ryouready.wordpress.com/2009/01/23/r-combining-vectors-or-data-frames-of-unequal-length-into-one-data-frame/

myList <- list (miRNAs_256c, miRNAs_shield) 
dat <- data.frame ()
for(i in seq(along=myList))for(j in names(myList[[i]]))dat[i,j] <- myList[[i]][j]
dat

This code works out without the error in cbind, but I only can catch two rows.

Third, I tried the function merge and seems to work but I can only catch 34 rows and my data frames has more the hundred.

dat <- merge (miRNAs_256c, miRNAs_shield, ALL=TRUE)

Afterward, I tried the function join from plyr with all the values for argument type

dat_join <- join(miRNAs_256c, miRNAs_shield, type = "full")

This merges both data frames, but it doesn't combine both but just add the second data frame at the end of the first one.

dat_join <- join(miRNAs_256c, miRNAs_shield, type = "left")

This variant is able to join both data frames but only all rows in miRNAs_256c, adding matching columns from miRNAs_shield the same occurs for the other values in type as inner or right.

I have tried all the possibilities listed above and I have looked for help on the web even here in stack overflow but I could not get help. it is impossible for me to have all the information from both data frames stored in only one. Can someone please give some useful help or advice where to find more help. I have depleted all possibilities I know.

1 Answer 1

1

I think it's a bit unclear exaclty how you want to do your merging and how your data.frames are alike. Can the same miR be found in both data.frames and so on?

Anyway, I first read the data you provided:

df1 <- read.table(text = "
miRNA             read_count  precursor  total   seq X256c_norm
1      dre-miR-21      13309   dre-mir-21-2 13309 13309  550709,65
2 dre-miR-181a-5p       1004 dre-mir-181a-1  1004  1004   41544,25
3 dre-miR-181a-5p        927 dre-mir-181a-2   927   927   38358,09
4    dre-miR-181c        592   dre-mir-181c   592   592   24496,21
5    dre-miR-181b        579 dre-mir-181b-2   579   579   23958,29
6    dre-miR-181b        561 dre-mir-181b-1   561   561   23213,47          
160 dre-miR-7149-5p        0   dre-mir-7149     0   0          0
161  dre-miR-723-5p        0    dre-mir-723     0   0          0
162  dre-miR-727-5p        0    dre-mir-727     0   0          0
163     dre-miR-730        0    dre-mir-730     0   0          0
164     dre-miR-735        0    dre-mir-735     0   0          0
165     dre-miR-740        0    dre-mir-740     0   0          0")

and

df2 <- read.table(text = "
miRNA        read_count precursor   total seq shield_norm
1 dre-let-7a        424  dre-let-7a-1   424 424    72939,96
2 dre-let-7a        397  dre-let-7a-6   397 397     68295,2
3 dre-let-7a        371  dre-let-7a-5   371 371    63822,47
4 dre-let-7a        367  dre-let-7a-3   367 367    63134,35
5 dre-miR-21        345  dre-mir-21-2   345 345    59349,73
6 dre-let-7a        343  dre-let-7a-2   343 343    59005,68
162 dre-miR-723-5p    0  dre-mir-723     0   0           0
163 dre-miR-727-5p    0  dre-mir-727     0   0           0
164    dre-miR-730    0  dre-mir-730     0   0           0
165    dre-miR-731    0  dre-mir-731     0   0           0
166    dre-miR-735    0  dre-mir-735     0   0           0
167    dre-miR-740    0  dre-mir-740     0   0           0")

We can then do a merge on the columns we specify. Does the following give what you want?

merge(df1, df2, 
      by = c("miRNA", "read_count", "precursor", "total", "seq"),
      all = TRUE)
#             miRNA read_count      precursor total   seq X256c_norm shield_norm
#1  dre-miR-181a-5p        927 dre-mir-181a-2   927   927   38358,09        <NA>
#2  dre-miR-181a-5p       1004 dre-mir-181a-1  1004  1004   41544,25        <NA>
#3     dre-miR-181b        561 dre-mir-181b-1   561   561   23213,47        <NA>
#4     dre-miR-181b        579 dre-mir-181b-2   579   579   23958,29        <NA>
#5     dre-miR-181c        592   dre-mir-181c   592   592   24496,21        <NA>
#6       dre-miR-21        345   dre-mir-21-2   345   345       <NA>    59349,73
#7       dre-miR-21      13309   dre-mir-21-2 13309 13309  550709,65        <NA>
#8  dre-miR-7149-5p          0   dre-mir-7149     0     0          0        <NA>
#9   dre-miR-723-5p          0    dre-mir-723     0     0          0           0
#10  dre-miR-727-5p          0    dre-mir-727     0     0          0           0
#11     dre-miR-730          0    dre-mir-730     0     0          0           0
#12     dre-miR-735          0    dre-mir-735     0     0          0           0
#13     dre-miR-740          0    dre-mir-740     0     0          0           0
#14      dre-let-7a        343   dre-let-7a-2   343   343       <NA>    59005,68
#15      dre-let-7a        367   dre-let-7a-3   367   367       <NA>    63134,35
#16      dre-let-7a        371   dre-let-7a-5   371   371       <NA>    63822,47
#17      dre-let-7a        397   dre-let-7a-6   397   397       <NA>     68295,2
#18      dre-let-7a        424   dre-let-7a-1   424   424       <NA>    72939,96
#19     dre-miR-731          0    dre-mir-731     0     0       <NA>           0

As you can see, the two data.frame are merged into one and the differing columns shield_norm and X256c_norm appear in the results. A <NA> is filled when the information is not available. As you can see dre-miR-727-5p (line 10) was present in both df1 and df2 and thus have the information in both shield_norm and X256c_norm filled.

If this is not what you want, can you please explain what you what your expected output should be in greater detail?

Edit. This actually amounts to what you have (almost) tried yourself. But I can see, that you have written ALL in upper case. The argument should be lower case as R is case sensitive, so perhaps that was why you only got 34 rows in the reslut.

Sign up to request clarification or add additional context in comments.

2 Comments

@user3523842 OK. So let's suppose you want the output rows 6 and 7 to be only one row. What number do you want in the read_count column then? As it is now, R only merges the rows when also the read_count agree in both data.frames. Edit. Sadly, you deleted your comment. Can you undo it? It was helpful to dermine what you want.
YES, that was really why I only got 34 rows. Also it is true, in both data frames we can find miRs that are present in both data frames as you have noticed. The output you got looks very well and useful for me I think I can work perfectly with it. THANK YOU VERY MUCH for your help and early answer. Leonardo.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.