1

I have a list of IDs (sorted) in two files and I ran the comm command to compare them, but it seems to miss out one lines common to both files. Why is that?

File1:

1
2
3
4
5
6
7
8
9
11
12
13
15
16
17
18
19
20
21
22

File2:

16
18
21
23
705
707
709
711
712
826
827
839
846
847
848
872
873
874
875
891

Comm output: $> comm file1 file1

1
    16  //exists in both files
    18  //exists in both files
2
    21
    23
3
4
5
6
7
    705
    707
    709
    711
    712
8
    826
    827
    839
    846
    847
    848
    872
    873
    874
    875
    891
9
11
12
13
15
16 //it's here!
17 
18 //...and here!
19
20
21
22

The files are both sorted. However, my guess is that comm doesn't do numeric comparison and only looks at entries lexicographically? If so, what are some alternatives that I can try for this?

1 Answer 1

4

comm should tell you that one of the files isn’t sorted:

comm: file 1 is not in sorted order

It expects the files to be sorted using the current locale’s collation order (as determined by LC_COLLATE); it won’t accept numerical order.

To compare the files, you can pre-sort them (lexicographically as you point out):

comm <(sort file1) <(sort file2)

If you want the result to be sorted numerically, sort it again:

comm <(sort file1) <(sort file2) | sort -n

This produces

1
2
3
4
5
6
7
8
9
11
12
13
15
        16
17
        18
19
20
        21
22
    23
    705
    707
    709
    711
    712
    826
    827
    839
    846
    847
    848
    872
    873
    874
    875
    891
5
  • The files ARE in sorted order...have a look at the snippets. Commented Apr 13, 2017 at 21:17
  • As far as comm is concerned, file1 isn’t sorted: it expects files to be sorted in lexicographical order, not in numerical order. Commented Apr 13, 2017 at 21:21
  • 1
    I had run the commands I gave you, as I always do. I’ve added the output. Note you missed 21 which is also common to both files. Commented Apr 13, 2017 at 21:24
  • I see - so even if they are sorted, it needs "lexicographical ordering" only to do its job correctly. Interesting nuance :) Commented Apr 13, 2017 at 21:25
  • Yes, as I mentioned in my comment above; I’ll add that to my answer, for clarity. Commented Apr 13, 2017 at 21:26

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.