0

I would like to transform the header of many csv files automatically using awk and bash scripts.

Currently, I am using the following code-block, which is working fine:

for FILE in *.csv;

do

awk 'FNR>1{print $0}' $FILE | awk 'NR == 1{print "aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,...,zzz"}1' > OUT_$FILE

done

What these commands are doing is that it first removes the old header from $FILE and then append prepend a new comma-separated (very long) header aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,...,zzz to $FILE and then save the output to OUT_$FILE.

Currently, I am copying the part aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,...,zzz manually from another csv file and pasting into this field to replace the header from $FILE. While it is working, it is getting tedious, repetitive and time-consuming for many csv files.

Instead of copying the header manually, I am trying to extract the header from another csv file new_headers.csv and save to a new variable $NEWHEAD.

NEWHEAD=$(awk 'NR==1{print $0}' new_headers.csv)

While I can view the extracted header $NEWHEAD, I am not sure how to merge this command into previous workflow to append prepend the headers from $FILE.

I will certainly appreciate any suggestions to resolve this problem. Thank you :)

2
  • 1
    Aside: To "append" something is to add it to the end; if you're putting a header at the beginning, you're prepending rather than appending. Commented Jan 20, 2022 at 17:41
  • Yes, you are right! Thank you for this suggestion, I have changed the word from 'append' to 'prepend'. Commented Jan 24, 2022 at 9:03

4 Answers 4

1

With GNU awk for "inplace" editing:

awk -i inplace 'NR==1{hdr=$0} {print (FNR>1 ? $0 : hdr)}' new_headers.csv *.csv
Sign up to request clarification or add additional context in comments.

1 Comment

This is a neat solution! It reduces many lines into a single line to replace the header from *.csv using the input from new_headers.csv. Thank you!
0
newheader=$(head -n 1 new_headers.csv)

for file in *.csv
do
    {
        printf '%s\n' "$newheader"
        tail -n +2 "$file" 
    } > OUT_"$file"
done

notes:

  • head -n 1 outputs the first line of a file
  • tail -n +2 outputs all the lines but the first
  • { } is to group commands, so that you redirect their output as a whole

1 Comment

Thank you for this suggestion! I tested the solution just now. It was easy to follow and worked!
0

You can read the header inside awk script, like this

awk '
  BEGIN{
    do {
      h = (h) ? (h "\n" line) : line
    } while ((getline line <"new_header.csv") > 0)
}

...
'

and h contains the new header.

2 Comments

This solution doesn't seem to work, it doesn't produces the output as expected...
This does not produce any output, it's showing how to read a file in awk, then you need to replace the ... by whatever you want to do with the header h
0
$ awk 'NR==FNR {header=$0; next} 
               {print (FNR==1?header:$0) > (FILENAME".updated")}' new_header.csv other files... 

capture the first record from the header file and replace the first lines from the rest of the files, updated files will have suffix ".updated".

caveat emptor not tested.

2 Comments

yes, right. I assumed the header file has just the header but might be better to make it extract the first line.
This solution is also working, I have made minimal changes to your suggestion, by changing 'updated' from suffix to be prefix because the suffix will modify the csv filetype from .csv to be .csv.updated. awk 'NR==FNR {header=$0; next}{print (FNR==1?header:$0) > ("updated_"FILENAME)}' new_header.csv *.csv Thank you for your suggestion!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.