2

I have 3 files:

In file1 I've data like this:

id,name,age
1,jj,60
2,kk,45
3,ss,56

In file2 I've data like this:

id,name,age
1,jj,60
2,kk,44
3,ss,55
4,tt,66

In file3 I've only one field i.e. id:

id
1
2
3

Now I want to compare file1 & file2 for those ID which are in file3.

Output should be like this:

id  file1   file2
2   age=45  age=44
3   age=56  age=55
4
  • Why is id 1 missing? Commented Feb 12, 2018 at 8:16
  • Because id 1 having same values for all fields in both files. Only id 2 & 3 having differences which i need to capture from file1 & file2 Commented Feb 12, 2018 at 8:39
  • Differ in column3 or in whole line? Commented Feb 12, 2018 at 9:25
  • File1 & File2 having more than 50 columns and difference may be in any of those column except ID column Commented Feb 13, 2018 at 10:18

2 Answers 2

2

Extended paste + awk solution:

awk -F',' -v ids=$(paste -s -d'|' <(tail -n+2 file3)) \
'BEGIN{ OFS="\t"; print "id", ARGV[1], ARGV[2] }
 FNR==1{ next }
 NR==FNR{ if ($1 ~ "^("ids")$") a[$1$2]=$3; next }
 ($1$2 in a) && a[$1$2] != $3{ 
     printf "%d\tage=%d\tage=%d\n", $1, a[$1$2], $3 
 }' file1 file2

The output:

id  file1   file2
2   age=45  age=44
3   age=56  age=55
3
  • But when i am adding more columns to file1 & file2, output remain same, Its not checking other columns. My file1 & File2 having more than 50 columns. Commented Feb 13, 2018 at 10:17
  • @Anand, that does not relate to your initial question with given input files. Consider creating another question with clean description Commented Feb 13, 2018 at 10:23
  • Thanks Roman, created new question with clear details. Apologize for the inconvenience. Commented Feb 13, 2018 at 12:14
1

Roman's answer is good and gives you the exact output you asked for.

If, however, you only needed the raw data (i.e. without CSV headers and without the printf pretty-printed formatting), you could use paste + comm + grep + process substitution:

$ paste <(comm -1 -3 file1 file2) <(comm -2 -3 file1 file2) | 
    grep -f <(sed -e 's/^/^/; s/$/,/;' file3)
2,kk,44 2,kk,45
3,ss,55 3,ss,56

Explanation:

comm -1 -3 file1 file2 outputs the lines that are unique to file2. comm -2 -3 file1 file2 outputs the lines unique to file1. The output of both these commands are given to paste via process substitution, and joined together.

paste's output is then piped into grep to output only the lines that match the ids listed in file3. Process substitution is also used here to transform the IDs listed in file3 into anchored regular expressions that match the ID numbers ONLY when they are at the start of a line and followed by a comma.

Doing something like this would mostly be useful if you wanted/needed to do further processing before pretty-printing the output.

1
  • (posted mostly because I'd almost finished writing this answer before I noticed the pretty-printed output requirement and didn't want to throw it away - someone with a similar question - or perhaps even the OP - might find this alternative approach useful) Commented Feb 12, 2018 at 12:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.