Skip to main content
65 votes

Two files comparison in bash script?

To just test whether two files are the same, use cmp -s: #!/bin/bash file1="/home/vekomy/santhosh/bigfiles.txt" file2="/home/vekomy/santhosh/bigfile2.txt" if cmp -s "$file1" "$file2"; then ...
Kusalananda's user avatar
  • 356k
62 votes

rsync compare directories?

Surprisingly no answer in 6 years uses the -i option or gives nice output so here I'll go: TLDR - Just show me the commands rsync -rin --ignore-existing "$LEFT_DIR"/ "$RIGHT_DIR"/|sed -e 's/^[^ ]* /...
ndemou's user avatar
  • 3,029
29 votes

Two files comparison in bash script?

The easiest way is to use the command diff. example: let's suppose the first file is file1.txt and he contains: I need to buy apples. I need to run the laundry. I need to wash the dog. I need to ...
Kingofkech's user avatar
  • 1,068
16 votes
Accepted

I want to compare values of two files, but not based on position or sequence

Compare the sorted files. In bash (or ksh or zsh), with a process substitution: diff <(sort File1.txt) <(sort File2.txt) In plain sh: sort File1.txt >File1.txt.sorted sort File1.txt >...
Gilles 'SO- stop being evil''s user avatar
12 votes
Accepted

Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?

czkawka is an open source tool which was created to find duplicate files (and images, videos or music) and present them through command-line or graphical interfaces, with an emphasis on speed. This ...
A.L's user avatar
  • 2,000
11 votes

Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?

You'd probably want to make sure you do a full compare (or hash) on the first and last 1MiB or so, where metadata can live that might be edited without introducing offsets to the compressed data. ...
Peter Cordes's user avatar
  • 6,690
10 votes

Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?

Does GNU cmp help you? You can use the -s option to suppress output and only use the return value It checks the file size first to skip any comparison on different file size With options -i (skip ...
Philippos's user avatar
  • 13.7k
9 votes
Accepted

ImageMagick compare without generating diff image

I was struggling with the same issue right now and found the answer: Yes! TL; DR: You specify NULL: as the filename for the diff, i.e. compare -metric rmse foo.png bar.png NULL: ImageMagick's ...
z-nexx's user avatar
  • 106
9 votes
Accepted

Compare the first 20 lines of two files

Using a shell with process substitutions (<(...)), e.g. bash or zsh: diff <( head -n 20 file1 ) <( head -n 20 file2 ) This run head -n 20 on each file to get the first 20 lines of each, in ...
Kusalananda's user avatar
  • 356k
8 votes

Compare directories but not content of files

I've just discovered tree. tree old_dir/ > tree_old tree new_dir/ > tree_new vimdiff tree_old tree_new
Valentas's user avatar
  • 369
8 votes

Diff only words in files

diff -w ignores all horizontal whitespace changes, which takes care of indentation but doesn't help if lines have been wrapped to a different width or if lines have been wrapped after text changes. ...
Gilles 'SO- stop being evil''s user avatar
6 votes

"cmp -s file1 file2" doesn't produce any output

-s is for silent, it's to tell cmp not to output anything¹ but only to reflect whether the files are identical or not in its exit status so that it can be used for instance in an if shell statement: ...
Stéphane Chazelas's user avatar
6 votes

I want to compare values of two files, but not based on position or sequence

Sort the files first (in bash): diff <(sort file1) <(sort file2)
Hauke Laging's user avatar
  • 94.6k
5 votes

Compare files from a list

There are several codes that do much of this work for you, for example: fdupes jdupes rdfind duff A few years ago I posted comparison runs of fdupes and rdfind at http://www.linuxforums.org/forum/...
drl's user avatar
  • 848
5 votes

How to know if a text file is a subset of another

With perl: if perl -0777 -e '$n = <>; $h = <>; exit(index($h,$n)<0)' needle.txt haystack.txt then echo needle.txt is found in haystack.txt fi -0octal defines the record delimiter. When ...
Stéphane Chazelas's user avatar
5 votes
Accepted

Extract the indexes of rows that are swapped in order between two files

This is one of those rare occasions when I'd probably use getline due to the size of your input files so we only save a handful of lines in memory at a time instead of >10G: $ cat tst.awk BEGIN { ...
Ed Morton's user avatar
  • 35.9k
5 votes

Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?

Shellscript implementation of the OP's, @vume's, idea Background with the example rsync Have a look at rsync. It has several levels of checking if files are identical. The manual man rsync is very ...
sudodus's user avatar
  • 6,686
5 votes

Is there a tool or script that can very quickly find duplicates by only comparing filesize and a small fraction of the file contents?

There is a tool called imosum that works similar to e.g. sha256sum, but it only uses three 16 kB blocks. The samples are taken from beginning, middle and end of the file, and file size is included in ...
jpa's user avatar
  • 1,562
4 votes

Converting number format and comparing the file

With the numfmt utility from GNU Coreutils: numfmt --delimiter='|' --field=2-3 --format='%.2f' < file 1|2.30|2.30|34 1|0.00|0.00|34 1|0.00|0.00|34 1|11.00|11.00|34 1|0.31|0.31|34 1|0.00|0.00|34 1|...
steeldriver's user avatar
  • 83.8k
4 votes
Accepted

Compare text files skipping N symbols from each line

Using cut: diff <(cut -c 20- file1) <(cut -c 20- file2) Note: with GNU cut the -c character option actually works on bytes not characters, but this should be fine as long as your output starts ...
jesse_b's user avatar
  • 41.6k
4 votes

How to know if a text file is a subset of another

From http://www.catonmat.net/blog/set-operations-in-unix-shell/: Comm compares two sorted files line by line. It may be run in such a way that it outputs lines that appear only in the first ...
alecbz's user avatar
  • 176
4 votes
Accepted

Compare 2 files based on the first column and print the not matched

You can use grep for this: $ grep -vwf <(cut -d, -f1 file1) file2 test4 Explanation grep options: -v, --invert-match Invert the sense of matching, to select non-matching lines. -w, --word-...
terdon's user avatar
  • 252k
4 votes

I want to compare values of two files, but not based on position or sequence

Using awk, you can make a hash index of every distinct input line text, using a command like: awk 'The magic' Q=A fileA Q=B fileB Q=C fileC ... 'The magic' per input line is: { X[$0] = X[$0] Q; } ...
Paul_Pedant's user avatar
  • 9,414
4 votes
Accepted

Comparison of N identical continuous characters from a set of two files with sequences

$ cat tst.awk BEGIN { wid = 30 } sub(/^>/,"") { hdr=$1; next } NR == FNR { a[hdr]=$0; next } { for ( hdrA in a ) { strA = a[hdrA] lgthA = length(strA) for ( ...
Ed Morton's user avatar
  • 35.9k
4 votes

Awk- Compare Numbers from Two Files and write Differences in New File

Using any POSIX awk: $ cat tst.awk BEGIN { FS = "[]=[]+" f1 = ARGV[1] f2 = ARGV[2] } { gsub(/[[:space:]]+/,"") gsub(/,/,"& ") key = $1 "[ ...
Ed Morton's user avatar
  • 35.9k
3 votes

rsync compare directories?

It took me a few tries to get this to work. Nils' answer requires that $TARGET ends in a trailing /, as explained by ジョージ. Here is a version that explicitly adds the trailing /: rsync -avun --delete ...
Orafu's user avatar
  • 133
3 votes

Compare directories but not content of files

If you only need to know if files from two file system branch are different (without look inside files) you can do something like this: find /opt/branch1 -type f | sort | xargs -i md5sum {} >/tmp/...
Chaky's user avatar
  • 31
3 votes

Compare files from a list

You could do: find foo* -name 'bar*Test.groovy' -type f -exec cksum {} + | sort (assuming file paths don't contain newline characters) which would give you a checksum (and size) for each file, ...
Stéphane Chazelas's user avatar
3 votes

Compare files from a list

Use return value of diff file1 file2 >/dev/null as it returns zero when files are the same and nonzero when files differ. Compare the files in two nested for cycles. Something as: for file1 in $...
Adam Trhon's user avatar
  • 1,633
3 votes

Compare two logs line by line and show differences and if the order of words from a line are not the same

Not exactly the format you're asking for, but wdiff is probably your best bet: $ wdiff f1.txt f2.txt She has [-132-] {+123+} apples George [-is-] 18 years {+is+} old {+Florin it's leaving+} Michael it'...
Satō Katsura's user avatar

Only top scored, non community-wiki answers of a minimum length are eligible