I have a file containing SNP data called snp.bed, which looks like this:
head snp.bed
Chr17 214708483 214708484 Chr17:214708484
Chr17 214708507 214708508 Chr17:214708508
Chr17 214708573 214708574 Chr17:214708574
I also have a file called intersect.bed, which looks like this:
head intersect.bed
Chr17 214708483 214708484 Chr17:214708484 Chr17 214706266 214710783 gene50573
Chr17 214708507 214708508 Chr17:214708508 Chr17 214706266 214710783 gene50573
Chr17 214708587 214708588 Chr17:214708580 Chr17 214706266 214710783 gene50573
I want to print out a modified version of snp.bed which contains an extra column appended to each row. If a row in snp.bed matches the first 4 columns of a row in intersect.bed, then I want to print the entire row from snp.bed with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed (the gene name). Alternatively, if a row from snp.bed does not match any row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.
This is my desired output:
head snp.matched.bed
Chr17 214708483 214708484 Chr17:214708484 gene50573
Chr17 214708507 214708508 Chr17:214708508 gene50573
Chr17 214708573 214708574 Chr17:214708574 NA
How can I do this?
join... man page for joinHanXRQprefix in the fourth column of output come from?