Compute Differences of Two Columns

Question

I have a CSV file filename.csv with the following contents.

filename.csv:

"Afghanistan","94.0","81.1"
"Bahamas","42.9","43.2"
"Bolivia (Plurinational State of)","86.7","31.9"
"Brazil","76.7","0.0"

I want to compute the difference between two columns (Column 2 - Column 3), but I got an incorrect output. All of the data from the other columns disappeared and I got the wrong output from the subtraction command.

Command I used: awk '{ print $2 - $3 }' filename.csv

Output I got:

Expected output file:

"Afghanistan","94.0","81.1","12.9"
"Bahamas","42.9","43.2","-0.3"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

Any help is appreciated. Thanks.

For next time, you could identify this sort of issue yourself with a simple piece of debug (even if you didn't know how to resolve it). My first question to myself would be, "if I'm getting 0 as the result, what are the source values?". For example, awk '{ print $2 }' filename.csv would have shown you what awk thought was column 2. You might then decide to try awk '{ print $1 }' filename.csv and then you would probably need to man awk and look for something to do with a field separator — Chris Davies
– Chris Davies, Commented Mar 16, 2022 at 9:43

user10489 · Accepted Answer · 2022-03-16 04:50:49Z

2

The default delimiter in awk is white space, but csv uses quotes and commas, so awk by default will see the csv file as one big column.

Also, you'll need to strip the quotes for it to recognize the numbers. So you can use -F'","' to change the delimiter to quoted strigns separated by comma, which will work as long as none of your quoted strings contain just a comma.

awk -F'","'  '{ print $2 - $3 }' filename.csv

Note that you have to quote the quotes for the shell.

edited Mar 16, 2022 at 4:50

answered Mar 16, 2022 at 4:46

user10489

10.9k1 gold badge15 silver badges37 bronze badges

Thanks for sharing, I understand your explanation. To display the rest of the columns for my expected output, I used the command awk -F'","' '{ print $0, $2 - $3 }' filename.csv | sort -k4 -n, but the fourth column (result) is not sorting. Do you happen to know the reason why it isn't sorting?

c200402
– c200402

2022-03-16 04:59:37 +00:00
Commented Mar 16, 2022 at 4:59
@c200402 look at the output of your awk statement before you pipe it into sort. (1) The last field is not separated from the third by a comma, (2) by default sort expects space-separated fields, not comma-separated fields

Chris Davies
– Chris Davies

2022-03-16 09:49:07 +00:00
Commented Mar 16, 2022 at 9:49

Add a comment |

fuzzydrawrings · Accepted Answer · 2022-03-28 04:08:39Z

0

This works:

$ awk 'BEGIN{FS=OFS="\""}{$7=",\""$4-$6"\""}1' filename.csv

Output:

"Afghanistan","94.0","81.1","12.9"
"Bahamas","42.9","43.2","-0.3"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

As an aside, IF the number fields were unquoted, it would be a lot simpler: awk -F, '{$4=$2-$3}1' filename.csv.

edited Mar 28, 2022 at 4:08

answered Mar 16, 2022 at 6:32

fuzzydrawrings

1,7647 silver badges13 bronze badges

actually, I tried that, it doesn't, because "94.0"-"81.1" is 0; still need to strip the quotes. Oh wait, you said if they were unquoted. Right.

user10489
– user10489

2022-03-17 05:19:38 +00:00
Commented Mar 17, 2022 at 5:19
@user10489 did you try the first command? I mean, it actually works.

fuzzydrawrings
– fuzzydrawrings

2022-03-28 04:04:48 +00:00
Commented Mar 28, 2022 at 4:04
Looks like it is fixed now.

user10489
– user10489

2022-03-28 04:32:40 +00:00
Commented Mar 28, 2022 at 4:32

Add a comment |

jubilatious1 · Accepted Answer · 2022-04-20 03:00:47Z

Using Raku (formerly known as Perl_6)

raku -MText::CSV -e 'my @a = csv(in => "/Path/To/File");  \
      my @b = @a.map(*.[1]) Z- @a.map(*.[2]); \ 
     .put for @a Z @b;'

#OR

raku -MText::CSV -e 'my @a = csv(in => "/Path/To/File");  \
      .put for @a Z (@a.map(*.[1]) Z- @a.map(*.[2]));'

Sample Input:

"Afghanistan","94.0","81.1"
"Bahamas","42.9","43.2"
"Bolivia (Plurinational State of)","86.7","31.9"
"Brazil","76.7","0.0"

Sample Output:

Afghanistan 94.0 81.1 12.9
Bahamas 42.9 43.2 -0.3
Bolivia (Plurinational State of) 86.7 31.9 54.8
Brazil 76.7 0.0 76.7

While I'm sure you can 'get-away-with' parsing csv on your own (for small projects), it may make sense at some point for you to use a CSV parser (to handle embedded quotes, embedded newlines, detect/change comma/non-comma column separators, trim whitespace, etc.).

You might be surprised how easy it is to invoke a CSV parser at the bash command line in the Raku programming language. First load theText::CSV module with the -M (module) commandline flag, which gives you -MText::CSV. Then tell Raku to expect code to follow with the -e commandline flag.

To explain the first example, an @a array is declared with my and takes csv input. A @b array is declared with my that takes the difference between (zero-indexed) column 1 and column 2. This difference is accomplished with the Z zip metaoperator appended with the minus operator, to make Z-. Finally, the Z zip metaoperator is used to output the original @a csv, zipped row-by-row with the @b column.

https://github.com/Tux/CSV/blob/master/doc/Text-CSV.pdf (click the Download link)
https://docs.raku.org/language/operators#index-entry-Z_(zip_metaoperator)
https://raku.org

Kusalananda · Accepted Answer · 2022-10-13 18:52:50Z

0

Using Miller (mlr) to read the header-less CSV file, create a new fourth field by computing the difference between the 2nd and 3rd field, format the resulting floating-point value according to the standard %g format specifier, and output quoted CSV:

$ mlr --csv -N --quote-all put '$4 = fmtnum($2 - $3, "%g")' file
"Afghanistan","94.0","81.1","12.9"
"Bahamas","42.9","43.2","-0.3"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

Note that the %g formatting seems to be broken in later releases of Miller, so you may need to use something like %.6g instead.

edited Oct 13, 2022 at 18:52

answered Oct 13, 2022 at 18:29

Kusalananda♦

356k42 gold badges735 silver badges1.1k bronze badges

Add a comment |

Stack Exchange Network

Compute Differences of Two Columns

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Compute Differences of Two Columns

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions