I'm trying to identify all lines in common based on the first column of one file. I'm using the following command:
awk '{print $1}' File1 | fgrep -wf - File2 >Out
File1:
M01605:153:000000000-B55NK:1:1101:10003:14536   chr1    150129998   A   Rev 18
M01605:153:000000000-B55NK:1:1101:10007:14573   chr17   44166311    C   38  44166311
M01605:153:000000000-B55NK:1:1101:10007:14573   chr17   44166500    G   Rev 34
M01605:153:000000000-B55NK:1:1101:10009:9160    chr8    16716272    G   35  16716395
M01605:153:000000000-B55NK:1:1101:10009:9160    chr8    16716336    A   37  16716337
M01605:153:000000000-B55NK:1:1101:10009:9160    chr8    16716336    A   38  16716459
M01605:153:000000000-B55NK:1:1101:10010:14111   chr8    89574844    A   38  89574844
M01605:153:000000000-B55NK:1:1101:10010:19939   chr3    181151945   T   36  181151945
M01605:153:000000000-B55NK:1:1101:10011:22802   chr17   43984669    A   34  43984765
M01605:153:000000000-B55NK:1:1101:10011:22802   chr17   43984669    A   38  43984689
File2:
M01605:153:000000000-B55NK:1:1101:10003:14536   2:N:0:1 GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10003:4882    2:N:0:1 GCACTGTAAAAAGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573   2:N:0:1 GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10007:5336    2:N:0:1 GTGTTTGTGTAGCTA 
M01605:153:000000000-B55NK:1:1101:10008:14477   2:N:0:1 GGGCGGAGGTGAAGA 
M01605:153:000000000-B55NK:1:1101:10009:18543   2:N:0:1 AGTTCGAGCGCAGTG 
M01605:153:000000000-B55NK:1:1101:10009:9160    2:N:0:1 CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111   2:N:0:1 CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939   2:N:0:1 TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802   1:N:0:1 TGAGTTCGGATAAAG 
Out:
M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1   GTTTGCGCCGATGTA 
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1   GGGGATAAGCGTTGC 
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1    CAGAAGAGGTAATGT 
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1   CTGCGTACTGATAGC 
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1   TCCGTGGTGCCGGCA 
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1   TGAGTTCGGATAAAG 
Expected Out:
M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG
Note the bolded lines are missing from the actual output generated and are what I want to be in the output file.
It seems like grep is running correctly, but then condensing all identical lines down into only one output line. Any suggestions?


M01605:153:000000000-B55NK:1:1101:10011:22802, right?File1and all ten lines ofFile2? I don’t think so; it seems to me that you can illustrate your problem with just the10007lines — and maybe also the10003lines, to avoid oversimplifying it.