- Split the input into words, one per line.
- Sort the resulting list of words (lines).
- Squash multiple occurences.
- Sort by occurrence count.
To split the input into words, replace any character that you deem to be a word separator by a newline.
<input_file \
tr -sc '[:alpha:]' '\n''[\n*]' |   # Add digits, -, \'', ... if you consider
                             # them word constituents
sort |
uniq -c |
sort -k 1nrnr
 
                 
                 
                