Revisions to How to concatenate lines of several associated files into one line, and append that to an output file

Add providion for missing second file and generalization to OPs actual example

Source Link

edited Sep 22, 2021 at 9:18

23.6k
25
55
77

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do (I will refer to your "abridged" example input here - adjust field numbers as needed for the actual case):

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    if [[ ! -f $f2 ]]; then f2=""; fi
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{for (i=1;i<NF;i++) printf "%s%s%s%s""%s%s",(NR==FNRNR==FNR&&i==1?"":OFS),$1,OFS,$2$i}
         FNR<=3{printf "%s%s",OFS,$NF}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

This will loop over all tmp1.tsv files and generate the corresponding filename for the tmp2.tsv file. If the second file turns out not to exist, the filename will be set to the empty string.

It will then call an awk program with both associate TSV files, which will print - all on the same line

all fields 1 and 2, excluding the last one, of the first line of each input file (but preceded by an additional OFS in case of the second input file, characterized by FNR, the per-file line counter, no longer being equal to NR, the global line counter),
the last field of each line until line 3,
when no input file remains, a closing record separator (defaults to newline)

and appends the output to template.tsv. This will also work if the second template file doesn't exist, because the empty string token will not be recognized as input file by awk in the first place, so the END section printing the newline will be reached after the first file already.

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do (I will refer to your "abridged" example input here - adjust field numbers as needed for the actual case):

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    if [[ ! -f $f2 ]]; then f2=""; fi
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{printf "%s%s%s%s",(NR==FNR?"":OFS),$1,OFS,$2}
         FNR<=3{printf "%s%s",OFS,$NF}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

This will loop over all tmp1.tsv files and generate the corresponding filename for the tmp2.tsv file. If the second file turns out not to exist, the filename will be set to the empty string.

It will then call an awk program with both associate TSV files, which will print - all on the same line

fields 1 and 2 of the first line of each input file (but preceded by an additional OFS in case of the second input file, characterized by FNR, the per-file line counter, no longer being equal to NR, the global line counter)
the last field of each line until line 3
when no input file remains, a closing record separator (defaults to newline)

and appends the output to template.tsv. This will also work if the second template file doesn't exist, because the empty string token will not be recognized as input file by awk in the first place.

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do:

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    if [[ ! -f $f2 ]]; then f2=""; fi
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{for (i=1;i<NF;i++) printf "%s%s",(NR==FNR&&i==1?"":OFS),$i}
         FNR<=3{printf "%s%s",OFS,$NF}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

This will loop over all tmp1.tsv files and generate the corresponding filename for the tmp2.tsv file. If the second file turns out not to exist, the filename will be set to the empty string.

It will then call an awk program with both associate TSV files, which will print - all on the same line

all fields, excluding the last one, of the first line of each input file (but preceded by an additional OFS in case of the second input file, characterized by FNR, the per-file line counter, no longer being equal to NR, the global line counter),
the last field of each line until line 3,
when no input file remains, a closing record separator (defaults to newline)

and appends the output to template.tsv. This will also work if the second template file doesn't exist, because the empty string token will not be recognized as input file by awk in the first place, so the END section printing the newline will be reached after the first file already.

Add providion for missing second file

Source Link

edited Sep 22, 2021 at 9:11

AdminBee

23.6k
25
55
77

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do (I will refer to your "abridged" example input here - adjust field numbers as needed for the actual case):

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    if [[ ! -f $f2 ]]; then f2=""; fi
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{printf "%s%s%s%s",(NR==FNR?"":OFS),$1,OFS,$2}
         FNR<=3{printf "%s%s",OFS,$3$NF}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

This will loop over all tmp1.tsv files and generate the corresponding filename for the tmp2.tsv file. If the second file turns out not to exist, the filename will be set to the empty string.

It will then call an awk program with both associate TSV files, which will print - all on the same line

fields 1 and 2 of the first line of each input file (but preceded by an additional OFS in case of the second input file, characterized by FNR, the per-file line counter, no longer being equal to NR, the global line counter)

the last field of each line until line 3

when no input file remains, a closing record separator (defaults to newline)

and appends the output to template.tsv. This will also work if the second template file doesn't exist, because the empty string token will not be recognized as input file by awk in the first place.

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do:

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{printf "%s%s%s%s",(NR==FNR?"":OFS),$1,OFS,$2}
         FNR<=3{printf "%s%s",OFS,$3}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do (I will refer to your "abridged" example input here - adjust field numbers as needed for the actual case):

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    if [[ ! -f $f2 ]]; then f2=""; fi
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{printf "%s%s%s%s",(NR==FNR?"":OFS),$1,OFS,$2}
         FNR<=3{printf "%s%s",OFS,$NF}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

This will loop over all tmp1.tsv files and generate the corresponding filename for the tmp2.tsv file. If the second file turns out not to exist, the filename will be set to the empty string.

It will then call an awk program with both associate TSV files, which will print - all on the same line

fields 1 and 2 of the first line of each input file (but preceded by an additional OFS in case of the second input file, characterized by FNR, the per-file line counter, no longer being equal to NR, the global line counter)

the last field of each line until line 3

when no input file remains, a closing record separator (defaults to newline)

and appends the output to template.tsv. This will also work if the second template file doesn't exist, because the empty string token will not be recognized as input file by awk in the first place.

Source Link

answered Sep 22, 2021 at 9:03

AdminBee

23.6k
25
55
77

Assuming that your input doesn't contain fringe cases, the following shell loop in connection to an awk program should do:

for f in BC*-tmp1.tsv
do
    f2="${f/%tmp1.tsv/tmp2.tsv}"
    awk 'BEGIN{FS=OFS="\t"}
         FNR==1{printf "%s%s%s%s",(NR==FNR?"":OFS),$1,OFS,$2}
         FNR<=3{printf "%s%s",OFS,$3}
         END{printf "%s",ORS}' "$f" "$f2" >> template.tsv
done

Stack Exchange Network

Return to Answer