0

i have CSV file with some database benchmark results here is the example:

Date;dbms;type;description;W;D;S;results;time;id
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;570;265;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;420;215;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;500;365;50
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;530;255;50

Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;870;265;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;620;215;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;700;365;99
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;530;255;99

i need to process all rows with the same id (value of the last column) and get this:

Date;dbms;type;description;W;D;S;time;results;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;sum column 8;sum column 9;(sum column 8 / sum column 9)

for now i can only do the sum of column 8 with the awk command:

awk -F";" '{print;sum+=$8 }END{print "sum " sum}' ./file.CSV

Edit:
need help with some modification of script iam already using. here are real input data:

Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;id
Mon Jun 15 14:53:41 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;272270;117508;50
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;10;280080;110063;50
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;144170;31815;60
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;10;137570;33910;60
Mon Jun 15 15:24:04 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;226660;97734;70
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;10;210420;95113;70
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;288360;119328;80
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;10;270360;124328;80

i need to sum values in time, TotalTransactions and NOTransactions columns and then add a column with value (sum NOTransactions/sum time)

iam using this script:

awk 'BEGIN {FS=OFS=";"}
(NR==1) {$10="results/time"; print $0} 
(NR>1 && NF) {sum7[$10]+=$7; sum8[$10]+=$8; sum9[$10]+=$9;  other[$10]=$0} 
 END {for (i in sum8) 
        {$0=other[i]; $7=sum7[i];$8=sum8[i]; $9=sum9[i]; $10=sprintf("%.0f", sum9[i]/sum7[i]); print}}' ./logsFinal.csv

gives me this output:

;;;;;;;;;results/time
Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
Mon Jun 15 15:13:53 CEST 2015;sqlite;in-memory;TPC-C test results;5;1;20;281740;65725;3286
Mon Jun 15 15:34:08 CEST 2015;hsql;in-memory;TPC-C test results;2;1;20;437080;192847;9642
Mon Jun 15 15:44:16 CEST 2015;hsql;in-memory;TPC-C test results;5;1;20;558720;243656;12183
Date;dbms;type;description;W;D;0;0;0;-nan

values looks good (except header row). But i need to get these results without id column (i want delete id column) So i need to get same values but instead of identify processed rows with same values in id column it must be rows with same values in dbms AND W AND D columns

1
  • 2
    Hint: awk -F';' '{col8[$NF]+=$8; col9[$NF]+=$9} END {for (i in col8) print i, col8[i]}' file. Also it is best to print what is the exact expected output so that it is clear we all understand the same. Commented Jun 15, 2015 at 15:29

2 Answers 2

2

You can use this awk:

awk 'BEGIN{ FS=OFS=";" }
NR>1 && NF {
        s=""
        for(i=1; i<=7; i++)
           s=s $i OFS;
        a[$NF]=s;
        sum8[$NF]+=$8
        sum9[$NF]+=$9
} END{
   for (i in a)
       print a[i] sum8[i], sum9[i], (sum9[i]?sum8[i]/sum9[i]:"NaN")
}' file
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. It was stupid of me to not to use , as field separator. However I didn't notice average was also needed in question.
1

This awk program will print the modified header and modify the output to contain the sums and their division:

awk 'BEGIN {FS=OFS=";"} 
     (NR==1) {$10="results/time"; print $0} 
     (NR>1 && NF) {sum8[$10]+=$8; sum9[$10]+=$9; other[$10]=$0} 
     END {for (i in sum8) 
            {$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

which gives:

Date;dbms;type;description;W;D;S;results;time;results/time
Mon Jun 15 14:22:20 CEST 2015;sqlite;on-disk;text;2;1;1;2020;1100;1.83636
Mon Jun 15 14:22:20 CEST 2015;hsql;on-disk;text;2;1;1;2720;1100;2.47273

You don't seem to care for the ID in the result, but if you do, just replace $10= with $11=.

Also, if you need to sum things based on values of more than one column, you can create a temporary variable (a in the example below) which is a concatenation of two columns and use it as an index in the arrays, like this:

awk 'BEGIN {FS=OFS=";"}
     (NR==1) {$10="results/time"; print $0}
     (NR>1 && NF) {a=$5$6; sum8[a]+=$8; sum9[a]+=$9; other[a]=$0}
     END {for (i in sum8)
            {$0=other[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}'

4 Comments

Your awk code seems to work fine for me. I don't get the problem with the header you mentioned.
What I get is: Date;dbms;type;description;W;D;time;TotalTransactions;NOTransactions;results/time' 'Mon Jun 15 15:03:46 CEST 2015;sqlite;in-memory;TPC-C test results;2;1;20;552350;227571;11379
See my edit, and please clean up your question then.
its almost what i need but the sum is made based on W and D columns i need also column dbms. i have changed your script to sum column time and tried to add make sum based on dbms value: awk 'BEGIN {FS=OFS=";"} (NR==1) {$10="results/time"; print $0} (NR>1 && NF) {a=$2; b=$5$6; c=$b$a; sum7[c]+=$7; sum8[c]+=$8; sum9[c]+=$9; other[c]=$0} END {for (i in sum8) {$0=other[i]; $7=sum7[i]; $8=sum8[i]; $9=sum9[i]; $10=(sum9[i]?sum8[i]/sum9[i]:"NaN"); print}}' ./input.csv but this is giving me bad output

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.