awk: appending columns from multiple csv files into a single csv file

Question

I have several CSV files (all have the same number of rows and columns). Each file follows this format:

1    100.23  1    102.03  1    87.65
2    300.56  2    131.43  2    291.32
.    .       .    .       .    . 
.    .       .    .       .    .
200  213.21  200  121.81  200  500.21

I need to extract columns 2, 4 and 6, and add them to a single CSV file. I have a loop in my shell script which goes through all the CSV files, extracts the columns, and appends these columns to a single file:

#output header column
awk -F"," 'BEGIN {OFS=","}{ print $1; }' "$input" > $output

for f in "$1"*.csv; 
do
   if [[ -f "$f" ]] #removes symlinks (only executes on files with .csv extension)
   then
       fname=$(basename $f)
       arr+=("$fname") #array to store filenames
       paste -d',' $output <(awk -F',' '{ print $2","$4","$6; }' "$f") > temp.csv
       mv temp.csv "$output"
   fi
done

Running this produces this output:

1    100.23  102.03  87.65   219.42  451.45  903.1   ... 542.12  321.56  209.2
2    300.56  131.43  291.32  89.57   897.21  234.52      125.21  902.25  254.12
.    .       .       .       .       .       .           .       .       .    
.    .       .       .       .       .       .           .       .       .
200  213.23  121.81  500.21  231.56  5023.1  451.09  ... 121.09  234.45  709.1

My desired output is a single CSV file that looks something like this:

     1.csv   1.csv   1.csv   2.csv   2.csv   2.csv   ...  700.csv  700.csv  700.csv
1    100.23  102.03  87.65   219.42  451.45  903.1        542.12   321.56   209.2
2    300.56  131.43  291.32  89.57   897.21  234.52       125.21   902.25   254.12
.    .       .       .       .       .       .            .        .        .       
.    .       .       .       .       .       .            .        .        .
200  213.23  121.81  500.21  231.56  5023.1  451.09  ...  121.09   234.45   709.1

In other words, I need a header row containing the file names in order to identify which files the columns were extracted from. I can't seem to wrap my head around how to do this.

What is the easiest way to achieve this (preferably using awk)? I was thinking of storing the file names into an array, inserting a header row and then print the array but I can't figure out the syntax.

You might want to consider pasting all the files together first, then passing the result plus the list of filenames to awk. — Mischa
– Mischa, Commented May 24, 2017 at 5:41
@Mischa If I understood correctly, I believe the loop I've written already does this and creates a single csv file. The problem comes afterwards where I need to insert a header row to store the file names. — Jason
– Jason, Commented May 24, 2017 at 5:55

Mischa · Accepted Answer · 2017-05-28 01:03:16Z

So, based on a few assumptions:

the inputs are called "*.csv" but they're actually whitespace-separated, as they appear.
the odd-numbered input columns just repeat the row number 3 times, and can be ignored
the column headings are just the filenames, repeated 3 times each
they are input to some other program, and the numbers are left-justified anyway, so you aren't particular about the column formatting (columns lining up, decimals aligned, ...)

Humble apologies because code PRE formatting is not working for me here

f=$(set -- *.csv; echo $*)

(echo $f; paste $f) |

awk 'NR==1 { for (i=1; i<=NF; i++) {x=x" "$i" "$i" "$i} }

NR > 1 { x=$1; for (i=2; i<= NF; i+=2) {x=x" "$i} }

{print x}'

hth

Collectives™ on Stack Overflow

awk: appending columns from multiple csv files into a single csv file

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related