2

I have a question concerning awk command in unix to merge multiple tables with a common value

Tab1

Geneid  Chr Start   End Strand  Length Sample_1
ENSG00000278267 1   17369   17436   -   68  0
ENSG00000243485 1;1;1   29554;30267;30976   30039;30667;31109   +;+;+   1021    0

Tab 2

Geneid  Chr Start   End Strand  Length Sample_2
ENSG00000278267 1   17369   17436   -   68  0
ENSG00000243485 1;1;1   29554;30267;30976   30039;30667;31109   +;+;+   1021    0

Tab 3

Geneid  Chr Start   End Strand  Length Sample_3
ENSG00000278267 1   17369   17436   -   68  0
ENSG00000243485 1;1;1   29554;30267;30976   30039;30667;31109   +;+;+   1021    0

As you can see, Geneid is similar in these tables, and I would like to merge these files into 1 with the GeneID column and the "Sample_n" column

awk 'NR==FNR {h[$1] = $7; next} {print $1,$7,h[$1]}' Sample_1.txt Sample_2.txt | head

If I don't miss something it means: NR==FNR, the first file is the template for the output {h[$1] = $7; next} h contains the GeneID of file 1 associated with value in 7th column {print $1,$7,h[$1]} print the first/seven/ column of the second file for the GeneID contained in h value

This work for 2 files, but not for 3 or more

Geneid Sample_1 Sample_2
ENSG00000278267 0 0 
ENSG00000243485 0 0 

I looked on this website, and people posted all the code, but I don't really understand the command, so does anybody know how to merge these files and can explain parameters in the command ?

2
  • Welcome to U/L. I haven't tried the awk code, but a simpler way is to just use join. Commented Sep 7, 2018 at 12:42
  • Yes, I tried it, but as I had a lot of file, I looked on awk command. @glen jackman comment solved it. Thanks for the help Commented Sep 10, 2018 at 12:34

1 Answer 1

3
awk '
    {samples[$1] = samples[$1] OFS $NF} 
    END {
        # print the header first
        print "Geneid", samples["Geneid"] 
        delete samples["Geneid"]
        # and then the rest of the data
        for (geneid in samples) print geneid, samples[geneid]
    }
' Tab*

Pipe the output into | column -t if you want to line up the columns

1
  • Thanks a lot for the codes and explanations, It works fine :D Commented Sep 10, 2018 at 12:32

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.