Skip to main content
added 227 characters in body
Source Link
xhienne
  • 18.2k
  • 2
  • 58
  • 71

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input with GNU sort:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

As your second file seems to have duplicated lines (see coments), you may want to change the second sort command to sort -k1,1 --stable --unique File2 (still assuming you are using GNU sort, use uniq).

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input with GNU sort:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

As your second file seems to have duplicated lines (see coments), you may want to change the second sort command to sort -k1,1 --stable --unique File2 (still assuming you are using GNU sort, use uniq).

added 241 characters in body
Source Link
xhienne
  • 18.2k
  • 2
  • 58
  • 71

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)
Source Link
xhienne
  • 18.2k
  • 2
  • 58
  • 71

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG