Tweeted twitter.com/StackUnix/status/927790658328432640

occurred Nov 7, 2017 at 6:51

Corrected grammar and cleaned up sentence structure and formatting.

Source Link

edited Nov 6, 2017 at 23:33

1.1k
3
21
33

I have a file containing SNP data called snp.bed, which looks like this:

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

I also have a file called intersect.bed, which looks like this:

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

I want to print out a modified version of snp.bed which contains an extra column appended to each row. If a row in snp.bed matches the first 4 columns of a row in intersect.bed, then I want to print the entire row from snp.bed with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed (the gene name). Alternatively, if a row from snp.bed does not match any row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.

This is my desired output:

head snp.matched.bed

    Chr17   214708483   214708484   HanXRQChr17Chr17:214708484   gene50573
    Chr17   214708507   214708508   HanXRQChr17Chr17:214708508   gene50573
    Chr17   214708573   214708574   HanXRQChr17Chr17:214708574   NA

How can I do this?

I have a file containing SNP data called snp.bed, which looks like this:

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

I also have a file called intersect.bed, which looks like this:

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

I want to print out a modified version of snp.bed which contains an extra column appended to each row. If a row in snp.bed matches the first 4 columns of a row in intersect.bed, then I want to print the entire row from snp.bed with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed (the gene name). Alternatively, if a row from snp.bed does not match any row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.

This is my desired output:

head snp.matched.bed

    Chr17   214708483   214708484   HanXRQChr17:214708484   gene50573
    Chr17   214708507   214708508   HanXRQChr17:214708508   gene50573
    Chr17   214708573   214708574   HanXRQChr17:214708574   NA

How can I do this?

I have a file containing SNP data called snp.bed, which looks like this:

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

I also have a file called intersect.bed, which looks like this:

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

I want to print out a modified version of snp.bed which contains an extra column appended to each row. If a row in snp.bed matches the first 4 columns of a row in intersect.bed, then I want to print the entire row from snp.bed with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed (the gene name). Alternatively, if a row from snp.bed does not match any row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.

This is my desired output:

head snp.matched.bed

    Chr17   214708483   214708484   Chr17:214708484   gene50573
    Chr17   214708507   214708508   Chr17:214708508   gene50573
    Chr17   214708573   214708574   Chr17:214708574   NA

How can I do this?

Corrected grammar and cleaned up sentence structure and formatting.

Source Link

edit approved Nov 6, 2017 at 23:33

igal

10.2k
4
45
60

I have a SNP file containing SNP data called snp.bed, which looks like this:

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

and an intersectI also have a file called intersect.bed, which looks like this:

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

I want to print out a modified version of snp.bed which contains an extra column appended to each row. If rowsa row in snp.bed matchsnp.bed matches the first 4th column4 columns of intersect.beda row in intersect.bed, then I want to print the entire row of snp.bedfrom snp.bed with an extra colum copyingcolumn obtained by adjoining the gene name (lastlast column of intersect.bedfrom the corresponding row in intersect.bed (the gene name) to snp.bed and Alternatively, if thea row from snp.bed does not match replace it with NAany row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.

This is my desired output:

head snp.matched.bed

    Chr17   214708483   214708484   HanXRQChr17:214708484   gene50573
    Chr17   214708507   214708508   HanXRQChr17:214708508   gene50573
    Chr17   214708573   214708574   HanXRQChr17:214708574   NA

How can I do itthis?Thanks

I have a SNP file looks like this

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

and an intersect file

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

If rows in snp.bed match the first 4th column of intersect.bed, print the entire row of snp.bed with an extra colum copying the gene name (last column of intersect.bed ) to snp.bed and if the row does not match replace it with NA.

This is my desired output

head snp.matched.bed

    Chr17   214708483   214708484   HanXRQChr17:214708484   gene50573
    Chr17   214708507   214708508   HanXRQChr17:214708508   gene50573
    Chr17   214708573   214708574   HanXRQChr17:214708574   NA

How can I do it?Thanks

I have a file containing SNP data called snp.bed, which looks like this:

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

I also have a file called intersect.bed, which looks like this:

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

I want to print out a modified version of snp.bed which contains an extra column appended to each row. If a row in snp.bed matches the first 4 columns of a row in intersect.bed, then I want to print the entire row from snp.bed with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed (the gene name). Alternatively, if a row from snp.bed does not match any row from intersect.bed then adjoin an extra column consisting of the string "NA" instead of the gene name.

This is my desired output:

head snp.matched.bed

    Chr17   214708483   214708484   HanXRQChr17:214708484   gene50573
    Chr17   214708507   214708508   HanXRQChr17:214708508   gene50573
    Chr17   214708573   214708574   HanXRQChr17:214708574   NA

How can I do this?

Source Link

asked Nov 6, 2017 at 22:35

Anna1364

1.1k
3
21
33

intersection between 2 files

I have a SNP file looks like this

head snp.bed

    Chr17   214708483   214708484   Chr17:214708484
    Chr17   214708507   214708508   Chr17:214708508
    Chr17   214708573   214708574   Chr17:214708574

and an intersect file

head intersect.bed

    Chr17   214708483   214708484   Chr17:214708484 Chr17   214706266   214710783   gene50573
    Chr17   214708507   214708508   Chr17:214708508 Chr17   214706266   214710783   gene50573
    Chr17   214708587   214708588   Chr17:214708580 Chr17   214706266   214710783   gene50573

If rows in snp.bed match the first 4th column of intersect.bed, print the entire row of snp.bed with an extra colum copying the gene name (last column of intersect.bed ) to snp.bed and if the row does not match replace it with NA.

This is my desired output

head snp.matched.bed

    Chr17   214708483   214708484   HanXRQChr17:214708484   gene50573
    Chr17   214708507   214708508   HanXRQChr17:214708508   gene50573
    Chr17   214708573   214708574   HanXRQChr17:214708574   NA

How can I do it?Thanks

text-processing awk

Stack Exchange Network

Return to Question

intersection between 2 files