Switching the format of this output?

Question

I have this script written to print the distribution of words in one or more files:

cat "$@" | tr -cs '[:alpha:]' '\n' | 
tr '[:upper:]' '[:lower:]' | sort | 
uniq -c | sort -n

Which gives me an output such as:

1 the
4 orange
17 cat

However, I would like to change it so that the word is listed first (I'm assuming sort would be involved so its alphabetical) , not the number, like so:

cat 17
orange 4
the 1

Is there just a simple option I would need to switch this? Or is it something more complicated?

glenn jackman · Accepted Answer · 2013-03-17 22:08:10Z

4

Pipe the output to

awk '{print $2, $1}'

or you can use awk for the complete task:

{
    $0 = tolower($0)    # remove case distinctions
    # remove punctuation
    gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
    for (i = 1; i <= NF; i++)
        freq[$i]++
}

END {
    for (word in freq)
        printf "%s\t%d\n", word, freq[word]
}

usage:

awk -f wordfreq.awk input

edited Mar 17, 2013 at 22:08

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

answered Mar 17, 2013 at 20:37

Fredrik Pihl

45.9k7 gold badges89 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

zyxxwyz Over a year ago

Thank you! I added a " " between $2 and $1 to make a space, then piped the awk output into another sort to make it alphabetical, now works great.

glenn jackman Over a year ago

+1, but need to sort the output of the larger awk program: awk -f wordfreq.awk input | sort

Fredrik Pihl Over a year ago

@glennjackman - praise from the master; you just made my day :-)

Collectives™ on Stack Overflow

Switching the format of this output?

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related