Revisions to Comparing two files using Unix and Awk

added 83 characters in body

Source Link

edited Jul 29, 2013 at 0:39

40.5k
8
113
146

Here's a solution in Perl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "$file1_line\n$_\n\n";"${file1_line}$_\n"; #we only need one extra newline as the lines read from the files have trailing ones.
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last one (bottom-most in the file) of these lines.

Here's a solution in Perl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "$file1_line\n$_\n\n";
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last one (bottom-most in the file) of these lines.

Here's a solution in Perl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "${file1_line}$_\n"; #we only need one extra newline as the lines read from the files have trailing ones.
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last one (bottom-most in the file) of these lines.

edited body

Source Link

edited Jul 29, 2013 at 0:30

Joseph R.

40.5k
8
113
146

Here's a solution in perlPerl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "$file1_line\n$_\n\n";
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last one (bottom-most in the file) of these lines.

Here's a solution in perl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "$file1_line\n$_\n\n";
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last of these lines.

Here's a solution in Perl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "$file1_line\n$_\n\n";
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last one (bottom-most in the file) of these lines.

Source Link

answered Jul 29, 2013 at 0:23

Joseph R.

40.5k
8
113
146

Here's a solution in perl. You should save the following code in a file and run it as a script (see below):

#!/usr/bin/perl
$file1 = '/path/to/file1';
$file2 = '/path/to/file2';
open $f1,'<',$file1;
open $f2,'<',$file2;
while(<$f1>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 1
    $lines_dictionary{"$c1 $c2 $c4"}="$c5---$_"; #create a hash entry keyed by the relevant columns
}
while(<$f2>){
    ($c1,$c2,$c4,$c5) = (split / /)[0,1,3,4]; #get relevant columns in file 2
    if(exists $lines_dictionary{"$c1 $c2 $c4"}){ #if a line with similar columns was seen in file 1
        ($file1_c5,$file1_line) = split /---/,$lines_dictionary{"$c1 $c2 $c4"}; #parse the hash entry this line in file 1
        if($file1_c5 -ne $c5){ #if column 5 of file 2 doesn't match column 5 of file 1
            print "$file1_line\n$_\n\n";
        }
    }
}
close $f1;
close $f2;

Use any text editor to paste this script into a file, modify the $file1 and $file2 variables to reflect the true locations of your files, then make the script executable by doing:

$ chmod +x /path/to/script

Finally, call the script:

$ /path/to/script

Disclaimer

This code is untested
This code assumes the pattern '---' is unlikely to occur in the 5th column.
This code assumes the lines in file 1 are unique (i.e. that each line has a different combination of "column1 column2 column4"). If there are multiple lines (not necessarily consecutive) containing the same data in the relevant columns, the script will use the last of these lines.

Stack Exchange Network

Return to Answer