Revisions to Filter csv file based on extended column values

added 14 characters in body

Source Link

edited Feb 11, 2015 at 12:58

252.2k
69
480
718

I have the following csv file:

ID,PDBID,FirstResidue,SecondResidue,ThirdResidue,FourthResidue,Pattern
RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_3,4tov,1043,1044,1047,1048,CC/GA Naked ribose-zipper
RZ_AUTO_4,4tov,1556,1557,1514,1515,CC/GA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_6,4tow,766,767,1524,1525,AA/CG Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

I need to go through each row and print rows where the value of FirstResidue and SecondResidue can be extended (meaning the SecondResidue becomes the FirstResidue in a different row having the same PDBID). For example, (line RZ_AUTO_1 & line line RZ_AUTO_2) AND (line RZ_AUTO_5 & line line RZ_AUTO_7). The output should look something like this:

RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

Is it possible to do this using awk or other unix methods? I'm using OSX.

I have the following csv file:

ID,PDBID,FirstResidue,SecondResidue,ThirdResidue,FourthResidue,Pattern
RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_3,4tov,1043,1044,1047,1048,CC/GA Naked ribose-zipper
RZ_AUTO_4,4tov,1556,1557,1514,1515,CC/GA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_6,4tow,766,767,1524,1525,AA/CG Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

I need to go through each row and print rows where the value of FirstResidue and SecondResidue can be extended (meaning the SecondResidue becomes the FirstResidue in a different row having the same PDBID). For example, (line RZ_AUTO_1 & line line RZ_AUTO_2) AND (line RZ_AUTO_5 & line line RZ_AUTO_7). The output should look something like this:

RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

Is it possible to do this using awk or other unix methods?

I have the following csv file:

ID,PDBID,FirstResidue,SecondResidue,ThirdResidue,FourthResidue,Pattern
RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_3,4tov,1043,1044,1047,1048,CC/GA Naked ribose-zipper
RZ_AUTO_4,4tov,1556,1557,1514,1515,CC/GA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_6,4tow,766,767,1524,1525,AA/CG Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

I need to go through each row and print rows where the value of FirstResidue and SecondResidue can be extended (meaning the SecondResidue becomes the FirstResidue in a different row having the same PDBID). For example, (line RZ_AUTO_1 & line line RZ_AUTO_2) AND (line RZ_AUTO_5 & line line RZ_AUTO_7). The output should look something like this:

RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

Is it possible to do this using awk or other unix methods? I'm using OSX.

deleted 6 characters in body; edited tags

Source Link

edited Feb 11, 2015 at 6:51

Anthon

81.4k
42
174
228

I have the following csv file:

ID,PDBID,FirstResidue,SecondResidue,ThirdResidue,FourthResidue,Pattern
RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_3,4tov,1043,1044,1047,1048,CC/GA Naked ribose-zipper
RZ_AUTO_4,4tov,1556,1557,1514,1515,CC/GA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_6,4tow,766,767,1524,1525,AA/CG Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

I need to go through each row and print rows where the value of FirstResidue and SecondResidue can be extended (meaning the SecondResidue becomes the FirstResidue in a different row having the same PDBID). For example, (line RZ_AUTO_1 & line line RZ_AUTO_2) AND (line RZ_AUTO_5 & line line RZ_AUTO_7). The output should look something like this:

RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

Is it possible to do this using awk or other unix methods? Thanks

I have the following csv file:

ID,PDBID,FirstResidue,SecondResidue,ThirdResidue,FourthResidue,Pattern
RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_3,4tov,1043,1044,1047,1048,CC/GA Naked ribose-zipper
RZ_AUTO_4,4tov,1556,1557,1514,1515,CC/GA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_6,4tow,766,767,1524,1525,AA/CG Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

I need to go through each row and print rows where the value of FirstResidue and SecondResidue can be extended (meaning the SecondResidue becomes the FirstResidue in a different row having the same PDBID). For example, (line RZ_AUTO_1 & line line RZ_AUTO_2) AND (line RZ_AUTO_5 & line line RZ_AUTO_7). The output should look something like this:

RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

Is it possible to do this using awk or other unix methods? Thanks

I have the following csv file:

ID,PDBID,FirstResidue,SecondResidue,ThirdResidue,FourthResidue,Pattern
RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_3,4tov,1043,1044,1047,1048,CC/GA Naked ribose-zipper
RZ_AUTO_4,4tov,1556,1557,1514,1515,CC/GA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_6,4tow,766,767,1524,1525,AA/CG Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

I need to go through each row and print rows where the value of FirstResidue and SecondResidue can be extended (meaning the SecondResidue becomes the FirstResidue in a different row having the same PDBID). For example, (line RZ_AUTO_1 & line line RZ_AUTO_2) AND (line RZ_AUTO_5 & line line RZ_AUTO_7). The output should look something like this:

RZ_AUTO_1,4tov,1404,1405,1518,1519,CG/AA Canonical ribose-zipper
RZ_AUTO_2,4tov,1405,1406,1517,1518,GU/AA Naked ribose-zipper
RZ_AUTO_5,4tow,130,131,99,100,AU/CA Canonical ribose-zipper
RZ_AUTO_7,4tow,131,132,98,99,UC/AC Canonical ribose-zipper

Is it possible to do this using awk or other unix methods?

edited tags

Link

edited Feb 11, 2015 at 0:46

terdon ♦

252.2k
69
480
718

text-processing awk csv

Source Link

asked Feb 11, 2015 at 0:39

Sri

165
1
5

Loading

Stack Exchange Network

Return to Question