Skip to main content
3 of 3
Corrected grammar and cleaned up sentence structure and formatting.
Anna1364
  • 1.1k
  • 3
  • 21
  • 33

intersection between 2 files

I have a file containing SNP data called snp.bed, which looks like this:

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

I also have a file called intersect.bed, which looks like this:

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

I want to print out a modified version of snp.bed which contains an extra column appended to each row. If a row in snp.bed matches the first 4 columns of a row in intersect.bed, then I want to print the entire row from snp.bed with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed (the gene name). Alternatively, if a row from snp.bed does not match any row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.

This is my desired output:

head snp.matched.bed

    Chr17   214708483   214708484   Chr17:214708484   gene50573
    Chr17   214708507   214708508   Chr17:214708508   gene50573
    Chr17   214708573   214708574   Chr17:214708574   NA

How can I do this?

Anna1364
  • 1.1k
  • 3
  • 21
  • 33