Skip to main content

compare two files based on a column and print it

I have two big files of 400,000 lines. I want to compare the column 1 of the second file with column 1 of first file recursively. If they match I would like to print the whole line. It is a sorted file.

file 1:
  name   values
  aaa    10
  aab    acc
  aac    30
  aac    abc

file2:
  aaa
  aac
  aac
  aad

since the file contains 400,000 lines it takes time to process.

My current solution is like this

#!/bin/ksh
while read line
do
var=`echo $line `
grep "$var" file1 >> /dev/null
if [ $? -eq 0 ]
then
grep "$var" file1 >> present
else
echo " $line missing " > missing
fi
done < "file2"

Since I am using grep here, the value may be present some where in the file1 other than the intended column1, I don't want that to happen.

My expected solution:

  1. compare the second file only with the column 1 of first file (even if we do this way it takes long time).
  2. Using a perl script with file pointer compare two columns of the files. If the string matches print it. Else if the column 1 of first file is greater than that of second file increment the file 2 AND COMPARE. If it is VICE VERSA increment the column 1 of file 1 and compare.
user68365
  • 231
  • 2
  • 3
  • 7