3

I am trying to compare 2 Tab Delimited files and output differences with Column Header and No Primary Key in File.

I am very near to it but what I am facing the problem is the Snippet i have reached into is working if and only if it has a Primary Key-

awk '
NR==1 { 
  for (i=1; i<=NF; i++)
    header[i] = $i
}
NR==FNR {
  for (i=1; i<=NF; i++) {
    A[i,NR] = $i
  }
  next
}
{
  for (i=1; i<=NF; i++)
    if (A[i,FNR] != $i)
      print "ID#-" $1 ": " header[i] "- " ARGV[1] " value= ", A[i,FNR]" / " ARGV[2] " value= "$i
}' t1.csv t2.csv

Can anyone please help me how to achive it

  1. When I don't have any primary key in it
  2. When Number of Rows are not same and one file has missing records

t1.csv

Month   ClientSegment   ClientType  IssuerClientSegment NetworkID   VD
2020-12 COMMUNITY   EXEMPT  COMMUNITY   0   OTHER   
2020-12 COMMUNITY   EXEMPT  COMMUNITY   2   OTHER   
2020-12 COMMUNITY   EXEMPT  COMMUNITY   5   OTHER

t2.csv

Month   ClientSegment   ClientType  IssuerClientSegment NetworkID   VD  
2020-12 COMMUNITY   EXEMPT  COMMUNITY   0   OTHER
2020-12 COMMUNITY   EXEMPT  COMMUNITY   2   OTHER1
2020-13 COMMUNITY   EXEMPT  COMMUNITY   2   PUSH
2020-13 COMMUNITY   EXEMPT  COMMUNITY   3   OTHER

Expecting Output Like:

Row 2, Column: VD- t1.csv value=  OTHER / t2.csv value= OTHER1

Missing in t2.csv
Month   Client Segment  Client Type Issuer Client Segment   Network ID  VD
2020-12 COMMUNITY   EXEMPT  COMMUNITY   5   OTHER

Missing in t1.csv
Month   Client Segment  Client Type Issuer Client Segment   Network ID  VD 
2020-13 COMMUNITY   EXEMPT  COMMUNITY   2   PUSH
2020-13 COMMUNITY   EXEMPT  COMMUNITY   3   OTHER
7
  • edited for better understanding Commented Dec 16, 2021 at 7:06
  • cuurent logic i am not able to incorporate Missing Items, also it is only working when i am having a Valid Key Column at the First Column Commented Dec 16, 2021 at 7:07
  • 1
    What do you mean by OTHER / OTHER1 ?! Can you please give us all information needed how to compare? Commented Dec 16, 2021 at 7:19
  • 1
    If you See Row 2 , the Column named VD has Different Value OTHER in one FIle and OTHER1 in another file, so basically it should point out the Row , and Column name and the Values in files where it is differing Commented Dec 16, 2021 at 7:24
  • But why is OTHER vs. OTHER1 not in the Missing section while OTHER vs. PUSH is? Commented Dec 16, 2021 at 7:27

2 Answers 2

1

Use daff:

daff --input-format tsv t1.csv t2.csv
@@  Month   ClientSegment   ClientType  IssuerClientSegment NetworkID   VD
    2020-12 COMMUNITY       EXEMPT      COMMUNITY           0           OTHER
→   2020-12 COMMUNITY       EXEMPT      COMMUNITY           2           OTHER→OTHER1
+++ 2020-13 COMMUNITY       EXEMPT      COMMUNITY           2           PUSH
+++ 2020-13 COMMUNITY       EXEMPT      COMMUNITY           3           OTHER
--- 2020-12 COMMUNITY       EXEMPT      COMMUNITY           5           OTHER

Install using pip install daff (you migh also need sudo apt install python-pip).

1
  • can u plzz help me to have this achieved in more programmatic way rather than a package? Commented Dec 16, 2021 at 13:39
1
awk '
{ key = $1 OFS $2 OFS $3 OFS $4 OFS $5 }
! secondInput {
        file1[key] = $6
        NRfile1[key] = NR
        next
}
(key in file1) {
        if (file1[key] != $NF) { print "diff-line#:", NRfile1[key] "|" FNR, $0 }
        delete file1[key]
        next
}
{ print "missing in file1: ", $0 }
END {
        for (key in file1) {
                print "missing in file2: ", key, file1[key]
        }
}' file1 secondInput=1 file2

Outputs:

diff-line#: 3|3 2020-12 COMMUNITY   EXEMPT  COMMUNITY   2   OTHER1
missing in file1:  2020-13 COMMUNITY   EXEMPT  COMMUNITY   2   PUSH
missing in file1:  2020-13 COMMUNITY   EXEMPT  COMMUNITY   3   OTHER
missing in file2:  2020-12 COMMUNITY EXEMPT COMMUNITY 5 OTHER
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.