0

I have 2 files.

File 1 is:

burnj01,Joe Burns
steves02,Santosh Steve

File 2 is:

burnj01,0001
burnj01,0002
burnj01,0010
burnj,0100
burnj01,2000
steves02,2048
steves02,2049
steves02,2091

and I would like to have a file 3 like this:

burnj01,Joe Burns,0001
burnj01,Joe Burns,0002
burnj01,Joe Burns,0010
burnj01,Joe Burns,0100
burnj01,Joe Burns,2000
steves02,Santosh Steve,2048
steves02,Santosh Steve,2049
steves02,Santosh Steve,2091

I'd like to combine these two files using col1 of first file and adding values of column 2 from File2 to the last column of File1 whenever there is a match.

I tried this :

$ awk 'FNR==NR{a[$1]=$2;next} {print $1,$2,a[$1]}' file2 file1 > file3

but this does not work. How do I achieve this ?

2
  • is burnj,0100 in line4 in file2 a typo? it should be burnj01,0100? or you want partial matching then? Commented Mar 2, 2021 at 10:53
  • I would look into the join command. Commented Mar 2, 2021 at 11:07

3 Answers 3

2

You should specify the field separator FS and OFS is used for output field separator.

awk 'BEGIN        { FS=OFS="," }
     NR==FNR      { saving[$1]=$2; next }
    ($1 in saving){ print $1, saving[$1], $2 }' file1  file2
2

The issue with your awk code is primarily twofold:

  1. You don't instruct awk to use , as a field delimiter. You can do this with -F ,.
  2. You never test whether $1 is a key in the a array. You can do this with $1 in a as a condition before the last block of your code. It's ok to skip this if you expect that all names in the second file are present in the first (but you don't say anything about this).

You also seem to output the fields in the wrong order compared with your expected output, and you use the default output delimiter, which is a space, rather than a comma (OFS = "," or OFS = FS, in a BEGIN block would fix that).

$ awk -F , 'BEGIN { OFS = FS } FNR == NR { names[$1] = $2; next } ($1 in names) { print $1, names[$1], $2 }' file1 file2
burnj01,Joe Burns,0001
burnj01,Joe Burns,0002
burnj01,Joe Burns,0010
burnj01,Joe Burns,2000
steves02,Santosh Steve,2048
steves02,Santosh Steve,2049
steves02,Santosh Steve,2091

$ join -t, <( sort file1 ) <( sort file2 )
burnj01,Joe Burns,0001
burnj01,Joe Burns,0002
burnj01,Joe Burns,0010
burnj01,Joe Burns,2000
steves02,Santosh Steve,2048
steves02,Santosh Steve,2049
steves02,Santosh Steve,2091

This sorts the two files and passes the sorted contents to the join utility. The join utility does a relational JOIN operation between the two datasets on the first column (by default), essentially an INNER JOIN, if you are familiar with SQL. We use -t , with join to tell it that the columns are comma-delimited.

If your shell does not understand the <( ... ) process substitution, then pre-sort the data. The join utility expects sorted input.

sort -o file1.sorted file1
sort file2 | join -t, file1.sorted -
rm -f file1.sorted
0

Join is the way to go:

join -t, <(sort file1) <(sort file2)
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.