Skip to main content
title typo fix; retagged
Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 265

Awk awk - matching on 2 columns for differentsdifferent lines

Post Migrated Here from stackoverflow.com (revisions)
Source Link
Andy K
  • 179
  • 9

Awk - matching on 2 columns for differents lines

Given this file:

92157768877;Sof_deme_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;20/02/2015;1;0;0
92157768877;Sof_trav_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;20/02/2015;1;0;0

91231838895;Sof_deme_faible_Email_am;EMAIL;26/01/2015;1 0;0
91231838895;Sof_nais_faible_Email_am;EMAIL;26/01/2015;1 0;0
91231838895;Sof_deme_Faible_Email_Relance_am;EMAIL;28/01/2015;1;0;0
91231838895;Sof_nais_faible_Email_Relance_am;EMAIL;28/01/2015;1;0;0
91231838895;Sof_deme_Faible_Email_Relance_am;EMAIL;30/01/2015;1;0;0

92100709652;Sof_voya_Faible_Email_am_%yyyy%%mm%%dd%;EMAIL;11/02/2015;1;0;0
92100709652 Sof_voya_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;11/02/2015;1;0;0
92100709652;Export Voya_Fort Postal;EXPORT;13/02/2015;1;0;0

92100709634;Export Voya_Fort Postal;EXPORT;15/02/2015;1;0;0
92100709634;Export Voya_Fort Postal;EXPORT;15/02/2015;1;0;0
92100709635;Deme_Voya_Fort Postal;EXPORT;16/02/2015;1;0;0

I want to get those lines that fulfill the following conditions:

  • 1st field is the same as the 1st field of the next line
  • 4th field is the same as the 4th field of the next line
  • the remaining lines match with their 1st field to the 1st field of the 1st line.

So that the output is like this:

92157768877;Sof_deme_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;20/02/2015;1;0;0
92157768877;Sof_trav_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;20/02/2015;1;0;0
91231838895;Sof_deme_faible_Email_am;EMAIL;26/01/2015;1 0;0
91231838895;Sof_nais_faible_Email_am;EMAIL;26/01/2015;1 0;0
91231838895;Sof_deme_Faible_Email_Relance_am;EMAIL;28/01/2015;1;0;0
91231838895;Sof_nais_faible_Email_Relance_am;EMAIL;28/01/2015;1;0;0
91231838895;Sof_deme_Faible_Email_Relance_am;EMAIL;30/01/2015;1;0;0
92100709652;Sof_voya_Faible_Email_am_%yyyy%%mm%%dd%;EMAIL;11/02/2015;1;0;0
92100709652 Sof_voya_Fort_Email_am_%yyyy%%mm%%dd%;EMAIL;11/02/2015;1;0;0
92100709652;Export Voya_Fort Postal;EXPORT;13/02/2015;1;0;0

I tried with the awk solution below but something is wrong. I cannot add the fourth field condition. And how should I select the subsequent lines?

awk -F";" 'FNR==NR{a[$1]++; next} && FNR==NR{a[$4]++; next} a[$1]==2  a[$4]==2' filetestv2.txt filetestv2.txt