65
            
            votes
        
        
            
            
        Two files comparison in bash script?
                    To just test whether two files are the same, use cmp -s:
#!/bin/bash
file1="/home/vekomy/santhosh/bigfiles.txt"
file2="/home/vekomy/santhosh/bigfile2.txt"
if cmp -s "$file1" "$file2"; then
    ...
                
            
       
        
            
                62
            
            votes
        
        
            
            
        rsync compare directories?
                    Surprisingly no answer in 6 years uses the -i option or gives nice output so here I'll go:
TLDR - Just show me the commands
rsync -rin --ignore-existing "$LEFT_DIR"/ "$RIGHT_DIR"/|sed -e 's/^[^ ]* /...
                
            
       
        
            
                29
            
            votes
        
        
            
        Two files comparison in bash script?
                    The easiest way is to use the command diff.
example:
let's suppose the first file is file1.txt and he contains:
I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to ...
                
            
       
        
            
                16
            
            votes
        
            
                
                Accepted
            
        
            
            
        I want to compare values of two files, but not based on position or sequence
                    Compare the sorted files.
In bash (or ksh or zsh), with a process substitution:
diff <(sort File1.txt) <(sort File2.txt)
In plain sh:
sort File1.txt >File1.txt.sorted
sort File1.txt >...
                
            
       
        
            
                12
            
            votes
        
            
                
                Accepted
            
        
            
            
        Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?
                    czkawka is an open source tool which was created to find duplicate files (and images, videos or music) and present them through command-line or graphical interfaces, with an emphasis on speed. This ...
                
            
       
        
            
                11
            
            votes
        
        
            
        Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?
                    You'd probably want to make sure you do a full compare (or hash) on the first and last 1MiB or so, where metadata can live that might be edited without introducing offsets to the compressed data.  ...
                
            
       
        
            
                10
            
            votes
        
        
            
        Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?
                    Does GNU cmp help you?
You can use the -s option to suppress output and only use the return value
It checks the file size first to skip any comparison on different file size
With options -i (skip ...
                
            
       
        
            
                9
            
            votes
        
            
                
                Accepted
            
        
            
            
        ImageMagick compare without generating diff image
                    I was struggling with the same issue right now and found the answer: Yes!
TL; DR: You specify NULL: as the filename for the diff, i.e.
compare -metric rmse foo.png bar.png NULL:
ImageMagick's ...
                
            
       
        
            
                9
            
            votes
        
            
                
                Accepted
            
        
            
            
        Compare the first 20 lines of two files
                    Using a shell with process substitutions (<(...)), e.g. bash or zsh:
diff <( head -n 20 file1 ) <( head -n 20 file2 )
This run head -n 20 on each file to get the first 20 lines of each, in ...
                
            
       
        
            
                8
            
            votes
        
        
        Compare directories but not content of files
                    I've just discovered tree.
tree old_dir/ > tree_old
tree new_dir/ > tree_new
vimdiff tree_old tree_new
                
            
       
        
            
                8
            
            votes
        
        
            
        Diff only words in files
                    diff -w ignores all horizontal whitespace changes, which takes care of indentation but doesn't help if lines have been wrapped to a different width or if lines have been wrapped after text changes.
...
                
            
       
        
            
                6
            
            votes
        
        
            
            
        "cmp -s file1 file2" doesn't produce any output
                    -s is for silent, it's to tell cmp not to output anything¹ but only to reflect whether the files are identical or not in its exit status so that it can be used for instance in an if shell statement:
...
                
            
       
        
            
                6
            
            votes
        
        
        I want to compare values of two files, but not based on position or sequence
                    Sort the files first (in bash):
diff <(sort file1) <(sort file2)
                
            
       
        
            
                5
            
            votes
        
        
            
        Compare files from a list
                    There are several codes that do much of this work for you, for example: fdupes jdupes rdfind duff
A few years ago I posted comparison runs of fdupes and rdfind at http://www.linuxforums.org/forum/...
                
            
       
        
            
                5
            
            votes
        
        
            
        How to know if a text file is a subset of another
                    With perl:
if perl -0777 -e '$n = <>; $h = <>; exit(index($h,$n)<0)' needle.txt haystack.txt
then echo needle.txt is found in haystack.txt
fi
-0octal defines the record delimiter. When ...
                
            
       
        
            
                5
            
            votes
        
            
                
                Accepted
            
        
            
            
        Extract the indexes of rows that are swapped in order between two files
                    This is one of those rare occasions when I'd probably use getline due to the size of your input files so we only save a handful of lines in memory at a time instead of >10G:
$ cat tst.awk
BEGIN {
  ...
                
            
       
        
            
                5
            
            votes
        
        
            
            
        Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?
                    Shellscript implementation of the OP's, @vume's, idea
Background with the example rsync
Have a look at rsync. It has several levels of checking if files are identical. The manual man rsync is very ...
                
            
       
        
            
                5
            
            votes
        
        
            
        Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?
                    There is a tool called imosum that works similar to e.g. sha256sum, but it only uses three 16 kB blocks. The samples are taken from beginning, middle and end of the file, and file size is included in ...
                
            
       
        
            
                4
            
            votes
        
        
            
        Converting number format and comparing the file
                    With the numfmt utility from GNU Coreutils:
numfmt --delimiter='|' --field=2-3 --format='%.2f' < file
1|2.30|2.30|34
1|0.00|0.00|34
1|0.00|0.00|34
1|11.00|11.00|34
1|0.31|0.31|34
1|0.00|0.00|34
1|...
                
            
       
        
            
                4
            
            votes
        
            
                
                Accepted
            
        
            
        Compare text files skipping N symbols from each line
                    Using cut:
diff <(cut -c 20- file1) <(cut -c 20- file2)
Note: with GNU cut the -c character option actually works on bytes not characters, but this should be fine as long as your output starts ...
                
            
       
        
            
                4
            
            votes
        
        
            
        How to know if a text file is a subset of another
                    From http://www.catonmat.net/blog/set-operations-in-unix-shell/:
  Comm compares two sorted files line by line. It may be run in such a way that it outputs lines that appear only in the first ...
                
            
       
        
            
                4
            
            votes
        
            
                
                Accepted
            
        
            
        Compare 2 files based on the first column and print the not matched
                    You can use grep for this:
$ grep -vwf <(cut -d, -f1 file1) file2
test4
Explanation
grep options:
-v, --invert-match
      Invert the sense of matching, to select non-matching lines.
-w, --word-...
                
            
       
        
            
                4
            
            votes
        
        
            
        I want to compare values of two files, but not based on position or sequence
                    Using awk, you can make a hash index of every distinct input line text, using a command like:
awk 'The magic' Q=A fileA Q=B fileB Q=C fileC ...
'The magic' per input line is:
{ X[$0] = X[$0] Q; }
...
                
            
       
        
            
                4
            
            votes
        
            
                
                Accepted
            
        
            
            
        Comparison of N identical continuous characters from a set of two files with sequences
                    $ cat tst.awk
BEGIN { wid = 30 }
sub(/^>/,"") { hdr=$1; next }
NR == FNR { a[hdr]=$0; next }
{
    for ( hdrA in a ) {
        strA  = a[hdrA]
        lgthA = length(strA)
        for ( ...
                
            
       
        
            
                4
            
            votes
        
        
            
            
        Awk- Compare Numbers from Two Files and write Differences in New File
                    Using any POSIX awk:
$ cat tst.awk
BEGIN {
    FS = "[]=[]+"
    f1 = ARGV[1]
    f2 = ARGV[2]
}
{
    gsub(/[[:space:]]+/,"")
    gsub(/,/,"& ")
    key = $1 "[ ...
                
            
       
        
            
                3
            
            votes
        
        
            
        rsync compare directories?
                    It took me a few tries to get this to work. Nils' answer requires that $TARGET ends in a trailing /, as explained by ジョージ.
Here is a version that explicitly adds the trailing /:
rsync -avun --delete ...
                
            
       
        
            
                3
            
            votes
        
        
            
        Compare directories but not content of files
                    If you only need to know if files from two file system branch are different (without look inside files) you can do something like this:
find /opt/branch1 -type f | sort | xargs -i md5sum {} >/tmp/...
                
            
       
        
            
                3
            
            votes
        
        
            
        Compare files from a list
                    You could do:
find foo* -name 'bar*Test.groovy' -type f -exec cksum {} + | sort
(assuming file paths don't contain newline characters) which would give you a checksum (and size) for each file, ...
                
            
       
        
            
                3
            
            votes
        
        
            
            
        Compare files from a list
                    Use return value of 
diff file1 file2 >/dev/null
as it returns zero when files are the same and nonzero when files differ.
Compare the files in two nested for cycles. Something as:
for file1 in $...
                
            
       
        
            
                3
            
            votes
        
        
            
        Compare two logs line by line and show differences and if the order of words from a line are not the same
                    Not exactly the format you're asking for, but wdiff is probably your best bet:
$ wdiff f1.txt f2.txt
She has [-132-] {+123+} apples
George [-is-] 18 years {+is+} old
{+Florin it's leaving+}
Michael it'...
                
            
       
        Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
file-comparison × 146diff × 51
shell-script × 21
text-processing × 21
files × 16
linux × 14
awk × 14
bash × 11
rsync × 11
scripting × 8
shell × 7
command-line × 7
grep × 6
directory × 5
find × 4
backup × 4
csv × 4
columns × 4
numeric-data × 4
comm × 4
git × 3
file-copy × 3
bioinformatics × 3
centos × 2
sed × 2
 
         
         
         
         
         
         
         
         
         
         
         
         
         
        