Data Comparison in 2 csv files

Question

I have 3 files:

In file1 I've data like this:

id,name,age
1,jj,60
2,kk,45
3,ss,56

In file2 I've data like this:

id,name,age
1,jj,60
2,kk,44
3,ss,55
4,tt,66

In file3 I've only one field i.e. id:

id
1
2
3

Now I want to compare file1 & file2 for those ID which are in file3.

Output should be like this:

id  file1   file2
2   age=45  age=44
3   age=56  age=55

Because id 1 having same values for all fields in both files. Only id 2 & 3 having differences which i need to capture from file1 & file2 — Priyanka
– Priyanka, Commented Feb 12, 2018 at 8:39
File1 & File2 having more than 50 columns and difference may be in any of those column except ID column — Priyanka
– Priyanka, Commented Feb 13, 2018 at 10:18

RomanPerekhrest · Accepted Answer · 2018-02-12 09:38:15Z

2

Extended paste + awk solution:

awk -F',' -v ids=$(paste -s -d'|' <(tail -n+2 file3)) \
'BEGIN{ OFS="\t"; print "id", ARGV[1], ARGV[2] }
 FNR==1{ next }
 NR==FNR{ if ($1 ~ "^("ids")$") a[$1$2]=$3; next }
 ($1$2 in a) && a[$1$2] != $3{ 
     printf "%d\tage=%d\tage=%d\n", $1, a[$1$2], $3 
 }' file1 file2

The output:

id  file1   file2
2   age=45  age=44
3   age=56  age=55

answered Feb 12, 2018 at 9:38

RomanPerekhrest

30.8k5 gold badges47 silver badges68 bronze badges

But when i am adding more columns to file1 & file2, output remain same, Its not checking other columns. My file1 & File2 having more than 50 columns.

Priyanka
– Priyanka

2018-02-13 10:17:28 +00:00
Commented Feb 13, 2018 at 10:17
@Anand, that does not relate to your initial question with given input files. Consider creating another question with clean description

RomanPerekhrest
– RomanPerekhrest

2018-02-13 10:23:51 +00:00
Commented Feb 13, 2018 at 10:23
Thanks Roman, created new question with clear details. Apologize for the inconvenience.

Priyanka
– Priyanka

2018-02-13 12:14:20 +00:00
Commented Feb 13, 2018 at 12:14

Add a comment |

cas · Accepted Answer · 2018-02-12 12:00:53Z

Roman's answer is good and gives you the exact output you asked for.

If, however, you only needed the raw data (i.e. without CSV headers and without the printf pretty-printed formatting), you could use paste + comm + grep + process substitution:

$ paste <(comm -1 -3 file1 file2) <(comm -2 -3 file1 file2) | 
    grep -f <(sed -e 's/^/^/; s/$/,/;' file3)
2,kk,44 2,kk,45
3,ss,55 3,ss,56

Explanation:

comm -1 -3 file1 file2 outputs the lines that are unique to file2. comm -2 -3 file1 file2 outputs the lines unique to file1. The output of both these commands are given to paste via process substitution, and joined together.

paste's output is then piped into grep to output only the lines that match the ids listed in file3. Process substitution is also used here to transform the IDs listed in file3 into anchored regular expressions that match the ID numbers ONLY when they are at the start of a line and followed by a comma.

Doing something like this would mostly be useful if you wanted/needed to do further processing before pretty-printing the output.

(posted mostly because I'd almost finished writing this answer before I noticed the pretty-printed output requirement and didn't want to throw it away - someone with a similar question - or perhaps even the OP - might find this alternative approach useful) — cas
– cas, Commented Feb 12, 2018 at 12:04

Stack Exchange Network

Data Comparison in 2 csv files

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Data Comparison in 2 csv files

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions