0

I have 2 files, each has more than 1000 rows,

head file 1


3.3    6.6    10    0    0.6    0.33    "Ha1_00044004__C"
0      0       0   10     0       1     "Ha1_00043486__A"
3.3    6.6    10    0    0.6    0.3     "Ha1_00045379__C"
3      6       9    1    0.6    0.4     "Ha1_00045316__C"

head file 2
0    0    0    10    0    1     "Ha1_00043486__A"
0    0    0    10    0    1     "Ha1_00043840__A"
0    0    0    10    0    1     "Ha1_00043671__A"
0    0    0    10    0    1     "Ha1_00044403__A"
3.3    6.6    10    0    0.6    0.3     "Ha1_00045379__C"
3      6       9    1    0.6    0.4     "Ha1_00045316__C"

I want to keep only those rows from the file1 that the last column "does not" match with the last column of file2. I would appreciate any help.

my desired output

3.3    6.6    10    0    0.6    0.33    "Ha1_00044004__C"
7
  • 1
    Can you give an example result? Should "file 1" be edited in place as described? Should there be a new file created with unique content from both files? Commented Apr 16, 2018 at 20:35
  • This is probably a duplicate of stackoverflow.com/questions/4544709/… Commented Apr 16, 2018 at 20:57
  • 1
    It is Not the same question! Commented Apr 16, 2018 at 21:10
  • What does "intersect" mean in this context? There don't appear to be any strict matches between the last fields in the file fragments you posted Commented Apr 16, 2018 at 21:15
  • 2
    @Anna1364 So to get this straight, you want to keep data from file1 if the data in it's last column does not match the data in the last column of file2? In your examples, nothing will match. Even if it did, what do you want to do with the changes? Write them to another file? Change the first file to mirror the second one? Commented Apr 16, 2018 at 22:02

2 Answers 2

2

You can create an associative array (or hash) keyed on the last fields of the first file, then check if the last field of the second file is NOT a valid key:

$ awk 'NR==FNR {a[$NF]++; next} !($NF in a)' file2 file1
3.3    6.6    10    0    0.6    0.33    "Ha1_00044004__C"
0

Try:

grep -vf <(awk '{print $7}' < file2) file1

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.