1

Iam using this awk to process csv files:

awk 'BEGIN {FS=OFS=";"} (NR==1) {$9="TpmC"; print $0} (NR>1 && NF) {a=$2$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0} END
{for(i in  sum7) {$0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i]; 
$9=(sum8[i]?sum8[i]/sum6[i]:"NaN"); print}}' input.csv > output.csv 

it is doing sum of rows in columns 6,7,8 and then division of sum8/sum6 everything for rows with the same value in column 2 and 5.

I have two questions about it
1) I need the same functionality but all calculations must be done for rows with the same value in columns 2,3 and 5. i have tried to replace

a=$2$5;

with

b=$2$3; a=$b$5;

but its giving me wrong numbers.

2) how can i delete all rows with value:

Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm

except first row?

here is some example of csv.input:

Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm
Tue Jun 16 21:08:33 CEST 2015;sqlite;in-memory;TPC-C test;1;10;83970;35975
Tue Jun 16 21:18:43 CEST 2015;sqlite;in-memory;TPC-C test;1;10;83470;35790
Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm
Tue Jun 16 23:35:35 CEST 2015;hsql;in-memory;TPC-C test;1;10;337120;144526
Tue Jun 16 23:45:44 CEST 2015;hsql;in-memory;TPC-C test;1;10;310230;133271
Thu Jun 18 00:10:45 CEST 2015;derby;on-disk;TPC-C test;5;120;64720;27964
Thu Jun 18 02:41:27 CEST 2015;sqlite;on-disk;TPC-C test;1;120;60030;25705
Thu Jun 18 04:42:14 CEST 2015;hsql;on-disk;TPC-C test;1;120;360900;154828   

output.csv should be

Date;DBMS;Mode;Test type;W;time;TotalTPCC;NewOrder Tpm;TpmC
Tue Jun 16 21:08:33 CEST 2015;sqlite;in-memory;TPC-C test;1;20;167440;71765;3588.25
Tue Jun 16 23:35:35 CEST 2015;hsql;in-memory;TPC-C test;1;20;647350;277797;13889.85
Thu Jun 18 00:10:45 CEST 2015;derby;on-disk;TPC-C test;5;120;64720;27964;233.03
Thu Jun 18 02:41:27 CEST 2015;sqlite;on-disk;TPC-C test;1;120;60030;25705;214.20
Thu Jun 18 04:42:14 CEST 2015;hsql;on-disk;TPC-C test;1;120;360900;154828;1290.23
1
  • 2
    Seeing (some lines of) input.csv may help us... Commented Jun 19, 2015 at 22:37

1 Answer 1

1

To group by columns 2,3, and 5 use a=$2$3$5. To delete the extra header rows, add match statement ($1 !~ /^Date/)

So the whole awk script becomes:

BEGIN {
  FS=OFS=";"
}
(NR==1) {$9="TpmC"; print $0}
(NR>1 && NF && ($1 !~ /^Date/)) {
  a=$2$3$5; sum6[a]+=$6; sum7[a]+=$7; sum8[a]+=$8; other[a]=$0
}
END {
  for(i in sum7) {
    $0=other[i]; $6=sum6[i]; $7=sum7[i]; $8=sum8[i]; $9=(sum8[i]?sum8[i]/sum6[i]:"NaN"); print
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.