1

Hi all I have a file which looks like this:

AAAA  5
BBBB  4
CCCC  12
...

(the file is tab separated and many 1000's of lines)

What I am interested in doing is summing the second column of values, which is straight forward:

awk '{sum +=$2}END{print sum}'

Which in the case of these 3 rows would give a value of 21. What I want to do is to first sum all of the 2nd column in the file, then print col1, col2, col2/sum. So the output would look like this:

AAAA 5 0.2380
BBBB 4 0.1904
CCCC 12 0.5714

What I have tried is this:

awk '{sum +=$2}END{print $1,$2,$2/sum}'

But it doesn't seem to work, all I get is "CCCC 12 0.5714" to be printed. I have been trying to figure this out, but can't seem to get it. Any help would be appreciated. Thanks

1 Answer 1

2
$ awk '{val[$1]=$2; sum+=$2} END{for (key in val) print key, val[key], (sum?val[key]/sum:0)}' file
CCCC 12 0.571429
BBBB 4 0.190476
AAAA 5 0.238095

To keep the input order:

$ awk '!($1 in val){keys[++numKeys]=$1} {val[$1]=$2; sum+=$2} END{for (keyNr=1; keyNr<=numKeys;keyNr++) { key=keys[keyNr]; print key, val[key], (sum?val[key]/sum:0)} }' file
AAAA 5 0.238095
BBBB 4 0.190476
CCCC 12 0.571429

and to format the number:

$ awk '!($1 in val){keys[++numKeys]=$1} {val[$1]=$2; sum+=$2} END{for (keyNr=1; keyNr<=numKeys;keyNr++) { key=keys[keyNr]; printf "%s %d %.4f\n", key, val[key], (sum?val[key]/sum:0)} }' file
AAAA 5 0.2381
BBBB 4 0.1905
CCCC 12 0.5714
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.