0

I have file1.txt:

1|2022-09-29|03:15:00
2|2022-09-29|10:50:00
3|2022-09-29|07:15:00

and file2.txt:

1|red|info 1
2|blue
3|yellow|info 2

and I want to connect this files into one, file3.txt, to make it looks like this:

red|2022-09-29|03:15:00|info 1
blue|2022-09-29|10:50:00|
yellow|2022-09-29|07:15:00|info 2

so I've tried to type a script:

#!/bin/bash

awk -F'|' 'NR==FNR {a[$1]=$2;next}  ($1 in a) {a[$1]=$2"|"a[$1]"|"a[$3]"|"$3; print a[$1]}' file1.txt file2.txt > file3.txt

but my output look like this:

red|2022-09-29||info 1
blue|2022-09-29||
yellow|2022-09-29||info 2

as you can see the 3rd part of file1.txt is missing and I can't figure it out why. I would be grateful for pointing out to me what I am doing wrong.

1
  • Does the files always have the same number of lines with the same $1 values as each other? Commented Sep 29, 2022 at 13:26

4 Answers 4

3

The answer is rather simple: You use a[$3] to refer to the third column of file1. However

  • you use the array a to store the second column of file1, and never the third column, and
  • you only ever use the first column (the numbers) as "keys", so attempting to access, say, a["info 1"] (as your a[$3] would do on the first line of processing file2) will return nothing.

The following program would do:

awk 'BEGIN{FS=OFS="|"} NR==FNR{d[$1]=$2;t[$1]=$3;next} ($1 in d) {print $2,d[$1],t[$1],$3}' file1.txt file2.txt > file3.txt

This sets | as field separator for input and output.

  • When processing file1.txt, it stores the day in an array d and the time in an array t, with the first column (number) as key.
  • When processing file2.txt, it prints column 2, the date and time corresponding to column 1, and then the "info" value in column 3, using | as output separator.
0
1

You could rely on join:

join -t\| -j 1 -o 1.2,2.2,2.3,1.3 file2 file1

Where the format(-o) is defined as FILE.FIELD, so just a selection of which field to take from which input file, -t is used to define the field delimiter and -j for defining the common field in either file to use for the matching.

Note that sorting might be needed:

join -t\| -j 1 -o 1.2,2.2,2.3,1.3 <(sort file2) <(sort file1)
1

awk 'BEGIN{FS=OFS="|"} NR==FNR {a[$1]=$2 OFS $3;next} ($1 in a) {print $2,a[$1],$3}' file1.txt file2.txt > file3.txt

For portability purpose I began with BEGIN{FS=OFS="|"}, which let you chose your field separator and output field separator.

Then while you are in your first file NR==FNR you register the second and third field separated with your output field separator {a[$1]=$2 OFS $3;next} and you got to the next line not to print anything yet. In your script you never registered your third field, that's why you couldn't output it.

When you arrive at the second file your NR and FNR will differ and check if your first field is in your array ($1 in a). Instead of registering your second field, your array and your third field then printing it, I just print it immediately {print $2,a[$1],$3}.

0
0

Assuming the data in the two input files are "simple" CSV records without fields containing embedded delimiters or newlines, that the | characters are the delimiters, and that the files match up row-to-row, as they appear to do in the question:

The two files may be presented side by side to awk using paste, and awk may be used to pick out the fields we want, in the order we need them:

paste -d '|' file1 file2 |
awk -F '|' 'BEGIN { OFS=FS } { print $5, $2, $3, $6 }' >file3

The result in file3 given the data in the question:

red|2022-09-29|03:15:00|info 1
blue|2022-09-29|10:50:00|
yellow|2022-09-29|07:15:00|info 2

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.