I'm trying to identify all lines in common based on the first column of one file. I'm using the following command:
awk '{print $1}' File1 | fgrep -wf - File2 >Out
File1:
M01605:153:000000000-B55NK:1:1101:10003:14536 chr1 150129998 A Rev 18
M01605:153:000000000-B55NK:1:1101:10007:14573 chr17 44166311 C 38 44166311
M01605:153:000000000-B55NK:1:1101:10007:14573 chr17 44166500 G Rev 34
M01605:153:000000000-B55NK:1:1101:10009:9160 chr8 16716272 G 35 16716395
M01605:153:000000000-B55NK:1:1101:10009:9160 chr8 16716336 A 37 16716337
M01605:153:000000000-B55NK:1:1101:10009:9160 chr8 16716336 A 38 16716459
M01605:153:000000000-B55NK:1:1101:10010:14111 chr8 89574844 A 38 89574844
M01605:153:000000000-B55NK:1:1101:10010:19939 chr3 181151945 T 36 181151945
M01605:153:000000000-B55NK:1:1101:10011:22802 chr17 43984669 A 34 43984765
M01605:153:000000000-B55NK:1:1101:10011:22802 chr17 43984669 A 38 43984689
File2:
M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA
M01605:153:000000000-B55NK:1:1101:10003:4882 2:N:0:1 GCACTGTAAAAAGTA
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC
M01605:153:000000000-B55NK:1:1101:10007:5336 2:N:0:1 GTGTTTGTGTAGCTA
M01605:153:000000000-B55NK:1:1101:10008:14477 2:N:0:1 GGGCGGAGGTGAAGA
M01605:153:000000000-B55NK:1:1101:10009:18543 2:N:0:1 AGTTCGAGCGCAGTG
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG
Out:
M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA
M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC
M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT
M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC
M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA
M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG
Expected Out:
M01605:153:000000000-B55NK:1:1101:10003:14536 2:N:0:1 GTTTGCGCCGATGTA M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC M01605:153:000000000-B55NK:1:1101:10007:14573 2:N:0:1 GGGGATAAGCGTTGC M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT M01605:153:000000000-B55NK:1:1101:10009:9160 2:N:0:1 CAGAAGAGGTAATGT M01605:153:000000000-B55NK:1:1101:10010:14111 2:N:0:1 CTGCGTACTGATAGC M01605:153:000000000-B55NK:1:1101:10010:19939 2:N:0:1 TCCGTGGTGCCGGCA M01605:153:000000000-B55NK:1:1101:10011:22802 1:N:0:1 TGAGTTCGGATAAAG
Note the bolded lines are missing from the actual output generated and are what I want to be in the output file.
It seems like grep is running correctly, but then condensing all identical lines down into only one output line. Any suggestions?
M01605:153:000000000-B55NK:1:1101:10011:22802, right?File1and all ten lines ofFile2? I don’t think so; it seems to me that you can illustrate your problem with just the10007lines — and maybe also the10003lines, to avoid oversimplifying it.