This is a modified part of an answer I wrote yesterday:
$ cksum file* | awk '{ ck[$1$2] = ck[$1$2] ? ck[$1$2] ", " $3 : $3 } END { for (i in ck) print ck[i] }'
file3, file5
file1, file2, file4
In your case you would use *.txt or even * (if all you have in the directory are the file you'd like to compare) rather than file*.
The result tells us that file3 and file5 have the same contents, as does file1, file2, and file4 (in this example).
The standard cksum utility will output three columns for each file. The first is a checksum, the second is a file size, and the third is a filename.
The awk code will use the checksum and size as a key in the array ck and store the filenames that have the same key in a comma-separated string for that key. At the end, the filenames (comma-separated string) are printed out.
The funny looking
ck[$1$2] = ck[$1$2] ? ck[$1$2] ", " $3 : $3
just means "if ck[$1$2] is set to anything, then assign ck[$1$2] ", " $3 to ck[$1$2] (appending a filename with a comma in-between), otherwise just assign $3 (it's the first filename with this key)".
To sort the output on the number of items in each list, pass the output through
awk -F, '{ print NF, $0 }' | sort -n | cut -d ' ' -f 2-
... as a post-processing stage. This will obviously break if any filename contains a comma.
Or use
cksum file* | awk '{ n[$1$2]++; ck[$1$2] = ck[$1$2] ? ck[$1$2] ", " $3 : $3 } END { for (i in ck) print n[i], ck[i] }' | sort -n | cut -d ' ' -f 2-
which does not have any issues with commas in filenames.
Leave the cut out if you'd like to see the number of filenames on each line of output.
For a huge number of files, you may want to use
find . -type f -exec cksum {} +
rather than just
cksum *
reg-file.txt,somefile.txtmd5sum * | sortis not quite what you are asking for, but it is simple and it will bring groups of identical files together - which is often all one needs - but it does need postprocessing to do exactly what you want.