1

I have this script written to print the distribution of words in one or more files:

cat "$@" | tr -cs '[:alpha:]' '\n' | 
tr '[:upper:]' '[:lower:]' | sort | 
uniq -c | sort -n

Which gives me an output such as:

1 the
4 orange
17 cat

However, I would like to change it so that the word is listed first (I'm assuming sort would be involved so its alphabetical) , not the number, like so:

cat 17
orange 4
the 1

Is there just a simple option I would need to switch this? Or is it something more complicated?

1 Answer 1

4

Pipe the output to

awk '{print $2, $1}'

or you can use awk for the complete task:

{
    $0 = tolower($0)    # remove case distinctions
    # remove punctuation
    gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
    for (i = 1; i <= NF; i++)
        freq[$i]++
}

END {
    for (word in freq)
        printf "%s\t%d\n", word, freq[word]
}

usage:

awk -f wordfreq.awk input
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! I added a " " between $2 and $1 to make a space, then piped the awk output into another sort to make it alphabetical, now works great.
+1, but need to sort the output of the larger awk program: awk -f wordfreq.awk input | sort
@glennjackman - praise from the master; you just made my day :-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.