Merge two files using awk in linux

Question

I have a 1.txt file:

[email protected]||o||0174686211||o||7880291304ca0404f4dac3dc205f1adf||o||Mario||o||Mario||o||Kawati
[email protected]||o||174732943.0174732943||o||e10adc3949ba59abbe56e057f20f883e||o||Tiziano||o||Tiziano||o||D'Intino
[email protected]||o||0174844404||o||8d496ce08a7ecef4721973cb9f777307||o||Melanie||o||Melanie||o||Kiesel
[email protected]||o||0174847613||o||536c1287d2dc086030497d1b8ea7a175||o||Sihem||o||Sihem||o||Sousou
[email protected]||o||174902297.0174902297||o||9893ac33a018e8d37e68c66cae23040e||o||Nabile||o||Nabile||o||Nassime
[email protected]||o||174912161.0174912161||o||0c770713436695c18a7939ad82bc8351||o||Donald||o||Donald||o||Duck
[email protected]||o||0174991962||o||d161dc716be5daf1649472ddf9e343e6||o||Dagmar||o||Dagmar||o||Cernakova
[email protected]||o||0175099675||o||d26005df3e5b416d6a39cc5bcfdef42b||o||Esmeralda||o||Esmeralda||o||Trogu
[email protected]||o||0175128896||o||2e9ce84389c3e2c003fd42bae3c49d12||o||Cat||o||Cat||o||Sou
[email protected]||o||0175228687||o||a7766a502e4f598c9ddb3a821bc02159||o||Anna||o||Anna||o||Beratsja
[email protected]||o||0175306898||o||297642a68e4e0b79fca312ac072a9d41||o||Celine||o||Celine||o||Jacinto
[email protected]||o||0175410459||o||a6565ca2bc8887cde5e0a9819d9a8ee9||o||Adem||o||Adem||o||Bulut

A 2.txt file:

9893ac33a018e8d37e68c66cae23040e:134:@a1
536c1287d2dc086030497d1b8ea7a175:~~@!:/92\
8d496ce08a7ecef4721973cb9f777307:demodemo

FS for 1.txt is "||o||" and for 2.txt is ":" I want to merge two files in a single file result.txt based on the condition that the 3rd column of 1.txt must match with 1st column of 2.txt file and should be replaced by the 2nd column of 2.txt file.

The expected output will contain all the matching lines: I am showing you one of them:

[email protected]||o||174902297.0174902297||o||134:@a1||o||Nabile||o||Nabile||o||Nassime

I tried the script:

awk -F"||o||"  'NR==FNR{s=$0; sub(/:[^:]*$/, "", s); a[s]=$NF;next} {s = $5; for (i=6; i<=NF; ++i) s = s "," $i; if (s in a) { NF = 5; $5=a[s]; print } }' FS=: <(tr -d '\r' < 2.txt) FS="||o||" OFS="||o||" <(tr -d '\r' < 1.txt) > result.txt

But getting an empty file as the result. Any help would be highly appreciated.

You need to backslash the pipe in the -F argument because it is treated as a regex. On MacOS I get "illegal primary in regular expression" with this -F argument. — tripleee
– tripleee, Commented Jan 4, 2018 at 6:13
You get an award for the craziest ad-hoc file format so far today. Are you really unable to find a better way to separate your columns? — tripleee
– tripleee, Commented Jan 4, 2018 at 6:14
2nd column value is 134 but your expected output shows 134@a1 — anubhava
– anubhava, Commented Jan 4, 2018 at 7:46

RavinderSingh13 · Accepted Answer · 2018-01-04 07:37:25Z

1

If your actual Input_file(s) are same as shown sample then following awk may help you in same.

awk -v s1="||o||" '
FNR==NR{
  a[$9]=$1 s1 $5;
  b[$9]=$13 s1 $17 s1 $21;
  next
}
($1 in a){
  print a[$1] s1 $2 FS $3 s1 b[$1]
}
' FS="|" 1.txt FS=":" 2.txt

EDIT: Since OP has changed requirement a bit so providing code as per new ask where it will create 2 files too 1 file which will have ids present in 1.txt and NOT in 2.txt and other will be vice versa of it.

awk -v s1="||o||" '
FNR==NR{
  a[$9]=$1 s1 $5;
  b[$9]=$13 s1 $17 s1 $21;
  c[$9]=$0;
  next
}
($1 in a){
  val=$1;
  $1="";
  sub(/:/,"");
  print a[val] s1 $0 s1 b[val];
  d[val]=$0;
  next
}
{
  print > "NOT_present_in_2.txt"
}
END{
for(i in d){
  delete c[i]
};
for(j in c){
  print j,c[j] > "NOT_present_in_1.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt

edited Jan 4, 2018 at 7:37

answered Jan 4, 2018 at 6:14

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Bhawan Over a year ago

what is s1="||o||" doing?

RavinderSingh13 Over a year ago

@BhawandeepSingla, I had taken it as a variable -v variable=value, this is how we define variables in a awk script/code.

Bhawan Over a year ago

I need a left.txt in which all the lines of 1.txt will be there which have no match in 2.txt , can you modify the script accordingly ?

Bhawan Over a year ago

[email protected]||o||0174844404||o||demodemo:||o||Melanie||o||Melanie||o||Kiesel Why I am getting demodemo: in the output, though there is no ':' at the end of demodemo in 2.txt ?

Bhawan Over a year ago

Or you can make another one for left.txt and fix the bug in this one for result.txt. depends on you. Thanks for the help, you are awesome :)

|

anubhava · Accepted Answer · 2018-01-04 08:02:59Z

0

You can use this awk to get your output:

awk -F ':' 'NR==FNR{a[$1]=$2 FS $3; next} FNR==1{FS=OFS="||o||"; gsub(/[|]/, "\\\\&", FS)}
$3 in a{$3=a[$3]; print}' file2 file1 > result.txt

cat result.txt
[email protected]||o||0174844404||o||demodemo:||o||Melanie||o||Melanie||o||Kiesel
[email protected]||o||0174847613||o||~~@!:/92\||o||Sihem||o||Sihem||o||Sousou
[email protected]||o||174902297.0174902297||o||134:@a1||o||Nabile||o||Nabile||o||Nassime

edited Jan 4, 2018 at 8:02

answered Jan 4, 2018 at 7:54

anubhava

790k67 gold badges603 silver badges671 bronze badges

Collectives™ on Stack Overflow

Merge two files using awk in linux

2 Answers 2

14 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

14 Comments

Comments

Linked

Related