Revisions to Grep not returning identical matches from awk pipe

added 227 characters in body

Source Link

edited Sep 14, 2017 at 22:32

18.2k
2
58
71

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input with GNU sort:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

As your second file seems to have duplicated lines (see coments), you may want to change the second sort command to sort -k1,1 --stable --unique File2 (still assuming you are using GNU sort, use uniq).

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input with GNU sort:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

As your second file seems to have duplicated lines (see coments), you may want to change the second sort command to sort -k1,1 --stable --unique File2 (still assuming you are using GNU sort, use uniq).

added 241 characters in body

Source Link

edited Sep 14, 2017 at 21:39

xhienne

18.2k
2
58
71

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

Your files may be sorted numerically but not alphabetically, as expected by join. If join is complaining, slightly modify the command above to sort the input:

$ awk '{print $1}' File1 | sort | join - <(sort -k1,1 --stable File2)

Source Link

answered Sep 14, 2017 at 21:23

xhienne

18.2k
2
58
71

That's exactly what the join command is made for: it joins two files based on a common field:

$ awk '{print $1}' File1 | join - File2

M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG

Stack Exchange Network

Return to Answer