0

I want to know the individual difference between n files, so similar to this:

parallel --tag 'diff {1} {2} | wc -l' ::: * ::: *

A big problem here is binary files, and a single megalong line will count the same as a short line.

How do I generate a fuzzy diff over n files?

1 Answer 1

2

Use ssdeep to generate a hash file:

ssdeep `find .  -type f` > hash

This will give the pairs with 90% <= similarity < 100%:

ssdeep -m hash `find .  -type f` | grep -E '9[0-9].$'

This only works if long stretches (blocks of around 1% of file size) are the same.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.