Skip to main content

extract Extract lines that have a specific ending and use those to extract from another file

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA
905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA
1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA
906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA
910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

extract lines that have a specific ending and use those to extract from another file

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA
905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA
1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA
906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA
910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

Extract lines that have a specific ending and use those to extract from another file

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA
905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA
1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA
906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA
910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

formatting
Source Link
Vlastimil Burián
  • 31.1k
  • 66
  • 209
  • 358

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA 905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA 1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA 906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA 910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA
905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA
1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA
906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA
910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA 905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA 1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA 906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA 910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA
905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA
1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA
906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA
910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown 907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA 905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA 1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA 906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA 910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA 905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA 1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA 906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA 910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:   

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown 907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA 905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA 1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA 906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA 910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file:  enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

I am in need of some help extracting lines.

I have two files, both of which I need to extract things from. The first file contains a barcoded line and ends with an OTU number. I need to extract lines that have specific OTU numbers.

Once I have the file of extracted lines, then I need to extract lines from my next file that match the barcodes from the first file.

For example, let's say I want to extract all of the lines that contain OTU_1 from this file enter image description here

There are unique barcodes associated with each line that contains OTU 1, in this example there are 5 shown:

907.2::M02542:207:000000000-AWDAH:1:1115:18838:201661:N:0:GTGAAA 905.2::M02542:207:000000000-AWDAH:1:1101:24324:103291:N:0:GTGAAA 1205.2::M02542:207:000000000-AWDAH:1:2115:22195:238121:N:0:GTGAAA 906.2::M02542:207:000000000-AWDAH:1:1115:24086:126561:N:0:GTGAAA 910.2::M02542:207:000000000-AWDAH:1:1112:26236:215801:N:0:GTGAAA

I will need to use these barcodes to extract sequences from my next file: 

enter image description here

As you can see, the barcodes start after > and I will need all of the information between the > (i.e. my sequences).

I have tried the obvious thing, which would be to use a spreadsheet type of software and sort by OTU #, but my files are too big (~ several billion lines long).

I would greatly appreciate help! Thank you

Source Link
Loading