I've found other links on the stackoverflow communities that were similar but they didn't answer my question exactly.
I have 2 files with a different number of lines BUT I have them both sorted. My original files are hundreds of lines long but for troubleshooting purposes, I made file1 have 12 lines and file2 have 5 lines. File2 is a subset of file1. What I want to do is run a command that outputs all the lines that are in file1 but are not in file2.
I tried using the Unix commands diff and comm but they both list the full contents of file1, which is not what I want.
A quick example of this would be:
File1 File2
A B
B E
C I
E N
G O
I
L
M
N
O
X
So here, we can see everything that's in file2 is in file1. For some reason, diff and comm both showed the full contents of file1. I assume it's because it's doing a line by line comparison and not searching thru the whole file.
Is there another Unix command I can run that will output what I am expecting?
EDIT: The commands I used to attempt to get what I needed were:
a) diff file1 file2
This basically listed everything from file1 with a < in front of it showing the content was from file1, and everything from file2 with a > in front of it. Definitely not what I needed
b) comm -23 file1 file2
This showed the whole content of file1 again and not the diff like I was expecting. I also
c) comm -3 file1 file2
The help page for comm said this would print lines in file 1 but not in file 2 and vice versa but this also didn't show what I wanted b/c in my example, B appears in both files but on different lines. However, the output thinks it's in one but not the other and therefore prints it out. So the output looked like this:
A
B
B
C
E
E
etc.
And it wasn't what I was expecting. I was expecting
A
C
G
L
M
X
comm -2 -3 File1 File2should outputACGLMX. If it does not, there may be something unexpected going on with the data, e.g.CRLF(Windows) line terminators in one of the two files.man diff. Everything you've asked is covered there.