0

I have two csv files:

first file:

"ACCOUNT_CODE","FK_CLIENT_CODE","ENVIRONMENT","HHID"  
"13445319","V8571485","SAT","IT00000000000005676070"  
"10580347","V6559553","SAT","IT00000000000003952833"   
"22124274","V11943127","DTT","IT00000000000008535651"   
"11896497","V7524852","SAT","IT00000000000005652668"  

second file:

IT00000000000005676070   
IT00000000000000060265           
IT00000000000008535651   
IT00000000000000060267         

Both files have millions of lines. I want to match the values of the first file (values of the last column called HHID) with the values of the second file. The values are not sorted and I want that for each HHID the search is performed in the whole second file.

If the value is not found, a third file will be created with values not present in the second file. Example new file:

third file:

"ACCOUNT_CODE","FK_CLIENT_CODE","ENVIRONMENT","HHID" 
"10580347","V6559553","SAT","IT00000000000003952833"       
"11896497","V7524852","SAT","IT00000000000005652668"

Could you please help me?

1 Answer 1

1
$ awk 'NR==FNR{a[$1];next} !($8 in a)' file2 FS='"' file1
"ACCOUNT_CODE","FK_CLIENT_CODE","ENVIRONMENT","HHID"
"10580347","V6559553","SAT","IT00000000000003952833"
"11896497","V7524852","SAT","IT00000000000005652668"
1
  • 1
    Very very useful :D Commented Jun 9, 2020 at 8:48

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.