2

I have two Files, File A and File B. The structure of the File A is mentioned shown below:

3314530275|76|1|20240422045006|
3335984469|64|2|20150804235959|
3367892381|203|3|20141025235959|
3369039388|203|4|20131219235959|

The contents of the second File B are given below:

3314530275|2000|999000000073101614|0|20370101000000|76|
3314530275|2000|999000000073101614|0|20370101000000|76|
3369039388|2000|812000002628721|-112|20360101235959|203|
3335984469|5037|5210367877660|180|20150213000000|64|
3335984469|5048|5210367877661|6|20150213000000|64|
3335984469|2000|812000002629182|1913|20360101235959|64|
3367892381|5014|5210365185964|419430400|20150308000000|203|
3367892381|5044|5210365185965|226020|20150308000000|203|
3367892381|2000|817000102009605|0|20360101235959|203|

The script should first check File A, if the third field ($3) is equal to 2, it should store the value of first ($1) and fourth column ($4).

Afterwards it will check if the $1 values (of the second file) are present in the values that we stored in the first step.

  1. If the value is present and the second field is equal to 2000 it should print $1,$2,$4,(Value of the fourth column that we got from the first file and stored it)

  2. If the value is present and the second field is not equal to 2000, it should print $1,$2,$4,$5

Sample Output in the above mentioned case:

3335984469|5037|180|20150213000000|
3335984469|5048|6|20150213000000|
3335984469|2000|1913|20150804235959|

This is what I have so far:

awk -F \| 'FNR==NR {if($3 == 2) a[$1] = $4; next} ($1 in a) {if($2==2000) print$1"|"$2"|"$4"|"a[$1]"|"} ($1 in a) {if($2!=2000) print$1"|"$2"|"$4"|"$5"|"} ' FileA FileB > Output_File

Any help will be greatly appreciated.

12
  • 1
    What have you got so far? Commented Feb 13, 2015 at 13:27
  • I have come up with this uptil now but I am not sure if am using the code correctly because the output seems to be missing a lot of values that should be present --------------- awk -F \| 'FNR==NR {if($3 == 2) a[$1] = $4; next} ($1 in a) {if($2==2000) print$1"|"$2"|"$4"|"a[$1]"|"} ($1 in a) {if($2!=2000) print$1"|"$2"|"$4"|"$5"|"} ' FileA FileB > Output_File Commented Feb 13, 2015 at 13:43
  • What am I looking for is an alternative way to achieve the same thing! My script works fine for a sample of values but when i use it on large files, the result is not the same Commented Feb 13, 2015 at 13:49
  • It looks like it should work, unless you have duplicate $1 in file A. Do you have duplicate first fields in file A? Commented Feb 13, 2015 at 14:18
  • 1
    @MuhammadAbdullah, looks right. The only change I'd make is to fold the if and else into the same block: $1 in a {if ($2 == 2000) print $1,$2,$4,a[$1],""; else print $1,$2,$4,$5,""} -- implies OFS="|" Commented Feb 13, 2015 at 14:28

2 Answers 2

1

Your script will work as-is given correct contents of fileA (335984469 in FileA should be 3335984469, i.e. one more leading 3.) but it can be simplified to:

$ cat tst.awk
BEGIN{ FS=OFS="|" }
FNR==NR { if ($3==2) a[$1] = $4; next }
$1 in a { print $1, $2, $4, ($2==200 ? a[$1] : $5), "" }

$ awk -f tst.awk fileA fileB
3335984469|5037|180|20150213000000|
3335984469|5048|6|20150213000000|
3335984469|2000|1913|20360101235959|

Feel free to cram it all back onto one line if you find that useful.

If the above doesn't work, check for the presence of control characters in both of your input files, the most likely being control_Ms as generously donated by Microsoft whenever their tools create files. You can check for them using cat -v and remove them with dos2unix or similar.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Ed Morton. There was a typo in file A as you pointed out. I corrected my mistake. Can you kindly explain how does ($2==2000 ? a[$1] : $5) this part of the code works?
If $2==2000 is true, the value a[$1] is returned, otherwise the value $5
@tripleee is correct and it's just a ternary expression, common to many languages - google "ternary expression".
Thanks @tripleee and Ed Morton!
0
awk  'BEGIN{FS=OFS="|"};FNR==NR{if($3==2){a[$1]=$4;next}};{if( $1 in a && $2==2000 ){print $1,$2,$4,a[$1]}else if ($1 in a && $2!=2000){print $1,$2,$4,$5}}' 'fileA'  'fileB'

adjustments that I have made to your command line to get the command line above

if( $1 in a && $2==2000 ){print $1,$2,$4,a[$1]}

else if ($1 in a && $2!=2000){print $1,$2,$4,$5}}

results

3335984469|5037|180|20150213000000
3335984469|5048|6|20150213000000
3335984469|2000|1913|20150804235959

1 Comment

This works as well but sorry @Xorg, i can only select one as correct answer! :(

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.