Combine two files using common column

Question

I have 2 files.

File 1 is:

burnj01,Joe Burns
steves02,Santosh Steve

File 2 is:

burnj01,0001
burnj01,0002
burnj01,0010
burnj,0100
burnj01,2000
steves02,2048
steves02,2049
steves02,2091

and I would like to have a file 3 like this:

burnj01,Joe Burns,0001
burnj01,Joe Burns,0002
burnj01,Joe Burns,0010
burnj01,Joe Burns,0100
burnj01,Joe Burns,2000
steves02,Santosh Steve,2048
steves02,Santosh Steve,2049
steves02,Santosh Steve,2091

I'd like to combine these two files using col1 of first file and adding values of column 2 from File2 to the last column of File1 whenever there is a match.

I tried this :

$ awk 'FNR==NR{a[$1]=$2;next} {print $1,$2,a[$1]}' file2 file1 > file3

but this does not work. How do I achieve this ?

is burnj,0100 in line4 in file2 a typo? it should be burnj01,0100? or you want partial matching then? — αғsнιη
– αғsнιη, Commented Mar 2, 2021 at 10:53

αғsнιη · Accepted Answer · 2021-03-02 10:51:02Z

2

You should specify the field separator FS and OFS is used for output field separator.

awk 'BEGIN        { FS=OFS="," }
     NR==FNR      { saving[$1]=$2; next }
    ($1 in saving){ print $1, saving[$1], $2 }' file1  file2

answered Mar 2, 2021 at 10:51

αғsнιη

41.9k17 gold badges75 silver badges117 bronze badges

Add a comment |

Kusalananda · Accepted Answer · 2021-03-02 12:17:38Z

The issue with your awk code is primarily twofold:

You don't instruct awk to use , as a field delimiter. You can do this with -F ,.
You never test whether $1 is a key in the a array. You can do this with $1 in a as a condition before the last block of your code. It's ok to skip this if you expect that all names in the second file are present in the first (but you don't say anything about this).

You also seem to output the fields in the wrong order compared with your expected output, and you use the default output delimiter, which is a space, rather than a comma (OFS = "," or OFS = FS, in a BEGIN block would fix that).

$ awk -F , 'BEGIN { OFS = FS } FNR == NR { names[$1] = $2; next } ($1 in names) { print $1, names[$1], $2 }' file1 file2
burnj01,Joe Burns,0001
burnj01,Joe Burns,0002
burnj01,Joe Burns,0010
burnj01,Joe Burns,2000
steves02,Santosh Steve,2048
steves02,Santosh Steve,2049
steves02,Santosh Steve,2091

$ join -t, <( sort file1 ) <( sort file2 )
burnj01,Joe Burns,0001
burnj01,Joe Burns,0002
burnj01,Joe Burns,0010
burnj01,Joe Burns,2000
steves02,Santosh Steve,2048
steves02,Santosh Steve,2049
steves02,Santosh Steve,2091

This sorts the two files and passes the sorted contents to the join utility. The join utility does a relational JOIN operation between the two datasets on the first column (by default), essentially an INNER JOIN, if you are familiar with SQL. We use -t , with join to tell it that the columns are comma-delimited.

If your shell does not understand the <( ... ) process substitution, then pre-sort the data. The join utility expects sorted input.

sort -o file1.sorted file1
sort file2 | join -t, file1.sorted -
rm -f file1.sorted

Anicet Ebou · Accepted Answer · 2021-03-02 14:30:17Z

0

Join is the way to go:

join -t, <(sort file1) <(sort file2)

edited Mar 2, 2021 at 14:30

answered Mar 2, 2021 at 11:09

Anicet Ebou

133 bronze badges

Add a comment |

Stack Exchange Network

Combine two files using common column

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Combine two files using common column

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions