0

I have several CSV files (all have the same number of rows and columns). Each file follows this format:

1    100.23  1    102.03  1    87.65
2    300.56  2    131.43  2    291.32
.    .       .    .       .    . 
.    .       .    .       .    .
200  213.21  200  121.81  200  500.21

I need to extract columns 2, 4 and 6, and add them to a single CSV file. I have a loop in my shell script which goes through all the CSV files, extracts the columns, and appends these columns to a single file:

#output header column
awk -F"," 'BEGIN {OFS=","}{ print $1; }' "$input" > $output

for f in "$1"*.csv; 
do
   if [[ -f "$f" ]] #removes symlinks (only executes on files with .csv extension)
   then
       fname=$(basename $f)
       arr+=("$fname") #array to store filenames
       paste -d',' $output <(awk -F',' '{ print $2","$4","$6; }' "$f") > temp.csv
       mv temp.csv "$output"
   fi
done

Running this produces this output:

1    100.23  102.03  87.65   219.42  451.45  903.1   ... 542.12  321.56  209.2
2    300.56  131.43  291.32  89.57   897.21  234.52      125.21  902.25  254.12
.    .       .       .       .       .       .           .       .       .    
.    .       .       .       .       .       .           .       .       .
200  213.23  121.81  500.21  231.56  5023.1  451.09  ... 121.09  234.45  709.1

My desired output is a single CSV file that looks something like this:

     1.csv   1.csv   1.csv   2.csv   2.csv   2.csv   ...  700.csv  700.csv  700.csv
1    100.23  102.03  87.65   219.42  451.45  903.1        542.12   321.56   209.2
2    300.56  131.43  291.32  89.57   897.21  234.52       125.21   902.25   254.12
.    .       .       .       .       .       .            .        .        .       
.    .       .       .       .       .       .            .        .        .
200  213.23  121.81  500.21  231.56  5023.1  451.09  ...  121.09   234.45   709.1

In other words, I need a header row containing the file names in order to identify which files the columns were extracted from. I can't seem to wrap my head around how to do this.

What is the easiest way to achieve this (preferably using awk)? I was thinking of storing the file names into an array, inserting a header row and then print the array but I can't figure out the syntax.

2
  • You might want to consider pasting all the files together first, then passing the result plus the list of filenames to awk. Commented May 24, 2017 at 5:41
  • @Mischa If I understood correctly, I believe the loop I've written already does this and creates a single csv file. The problem comes afterwards where I need to insert a header row to store the file names. Commented May 24, 2017 at 5:55

1 Answer 1

0

So, based on a few assumptions:

  • the inputs are called "*.csv" but they're actually whitespace-separated, as they appear.
  • the odd-numbered input columns just repeat the row number 3 times, and can be ignored
  • the column headings are just the filenames, repeated 3 times each
  • they are input to some other program, and the numbers are left-justified anyway, so you aren't particular about the column formatting (columns lining up, decimals aligned, ...)

Humble apologies because code PRE formatting is not working for me here

f=$(set -- *.csv; echo $*)

(echo $f; paste $f) |

awk 'NR==1 { for (i=1; i<=NF; i++) {x=x" "$i" "$i" "$i} }

NR > 1 { x=$1; for (i=2; i<= NF; i+=2) {x=x" "$i} }

{print x}'

hth

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.