1

I have a CSV file filename.csv with the following contents.

filename.csv:

"Afghanistan","94.0","81.1"
"Bahamas","42.9","43.2"
"Bolivia (Plurinational State of)","86.7","31.9"
"Brazil","76.7","0.0"

I want to compute the difference between two columns (Column 2 - Column 3), but I got an incorrect output. All of the data from the other columns disappeared and I got the wrong output from the subtraction command.

Command I used: awk '{ print $2 - $3 }' filename.csv

Output I got:

0
0
0
0

Expected output file:

"Afghanistan","94.0","81.1","12.9"
"Bahamas","42.9","43.2","-0.3"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

Any help is appreciated. Thanks.

1
  • For next time, you could identify this sort of issue yourself with a simple piece of debug (even if you didn't know how to resolve it). My first question to myself would be, "if I'm getting 0 as the result, what are the source values?". For example, awk '{ print $2 }' filename.csv would have shown you what awk thought was column 2. You might then decide to try awk '{ print $1 }' filename.csv and then you would probably need to man awk and look for something to do with a field separator Commented Mar 16, 2022 at 9:43

4 Answers 4

2

The default delimiter in awk is white space, but csv uses quotes and commas, so awk by default will see the csv file as one big column.

Also, you'll need to strip the quotes for it to recognize the numbers. So you can use -F'","' to change the delimiter to quoted strigns separated by comma, which will work as long as none of your quoted strings contain just a comma.

awk -F'","'  '{ print $2 - $3 }' filename.csv

Note that you have to quote the quotes for the shell.

2
  • Thanks for sharing, I understand your explanation. To display the rest of the columns for my expected output, I used the command awk -F'","' '{ print $0, $2 - $3 }' filename.csv | sort -k4 -n, but the fourth column (result) is not sorting. Do you happen to know the reason why it isn't sorting? Commented Mar 16, 2022 at 4:59
  • @c200402 look at the output of your awk statement before you pipe it into sort. (1) The last field is not separated from the third by a comma, (2) by default sort expects space-separated fields, not comma-separated fields Commented Mar 16, 2022 at 9:49
0

This works:

$ awk 'BEGIN{FS=OFS="\""}{$7=",\""$4-$6"\""}1' filename.csv

Output:

"Afghanistan","94.0","81.1","12.9"
"Bahamas","42.9","43.2","-0.3"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

As an aside, IF the number fields were unquoted, it would be a lot simpler: awk -F, '{$4=$2-$3}1' filename.csv.

3
  • actually, I tried that, it doesn't, because "94.0"-"81.1" is 0; still need to strip the quotes. Oh wait, you said if they were unquoted. Right. Commented Mar 17, 2022 at 5:19
  • @user10489 did you try the first command? I mean, it actually works. Commented Mar 28, 2022 at 4:04
  • Looks like it is fixed now. Commented Mar 28, 2022 at 4:32
0

Using Raku (formerly known as Perl_6)

raku -MText::CSV -e 'my @a = csv(in => "/Path/To/File");  \
      my @b = @a.map(*.[1]) Z- @a.map(*.[2]); \ 
     .put for @a Z @b;'

#OR

raku -MText::CSV -e 'my @a = csv(in => "/Path/To/File");  \
      .put for @a Z (@a.map(*.[1]) Z- @a.map(*.[2]));' 

Sample Input:

"Afghanistan","94.0","81.1"
"Bahamas","42.9","43.2"
"Bolivia (Plurinational State of)","86.7","31.9"
"Brazil","76.7","0.0"

Sample Output:

Afghanistan 94.0 81.1 12.9
Bahamas 42.9 43.2 -0.3
Bolivia (Plurinational State of) 86.7 31.9 54.8
Brazil 76.7 0.0 76.7

While I'm sure you can 'get-away-with' parsing csv on your own (for small projects), it may make sense at some point for you to use a CSV parser (to handle embedded quotes, embedded newlines, detect/change comma/non-comma column separators, trim whitespace, etc.).

You might be surprised how easy it is to invoke a CSV parser at the bash command line in the Raku programming language. First load theText::CSV module with the -M (module) commandline flag, which gives you -MText::CSV. Then tell Raku to expect code to follow with the -e commandline flag.

To explain the first example, an @a array is declared with my and takes csv input. A @b array is declared with my that takes the difference between (zero-indexed) column 1 and column 2. This difference is accomplished with the Z zip metaoperator appended with the minus operator, to make Z-. Finally, the Z zip metaoperator is used to output the original @a csv, zipped row-by-row with the @b column.

https://github.com/Tux/CSV/blob/master/doc/Text-CSV.pdf (click the Download link)
https://docs.raku.org/language/operators#index-entry-Z_(zip_metaoperator)
https://raku.org

0

Using Miller (mlr) to read the header-less CSV file, create a new fourth field by computing the difference between the 2nd and 3rd field, format the resulting floating-point value according to the standard %g format specifier, and output quoted CSV:

$ mlr --csv -N --quote-all put '$4 = fmtnum($2 - $3, "%g")' file
"Afghanistan","94.0","81.1","12.9"
"Bahamas","42.9","43.2","-0.3"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

Note that the %g formatting seems to be broken in later releases of Miller, so you may need to use something like %.6g instead.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.