1

I have two files of varying columns.

File1:

pears   are fruits
apple   is  fruit
carrot  is  veg
celery  is  vegetable
oranges are fruits

File2:

fruits apple   mycode is #q123c# for apple
fruits pears   my code is #q432c# for juicy
veg    celery  my code value is #q989c# for vegetables
veg    spinach code is #q783c# and is a type of vegetable
fruits papaya  i have code #q346c#
vegie  lettuce code #q445c# is vege

Need Ideal Output file:

Q432C pears fruits
Q123C apple fruit
Q---C carrot veg
Q989C celery vegetable
Q---C oranges fruits

Need to compare column 1 of File1 to column 2 of File2. If there is a match, print the q-to-c codes inside the two # fields in File2, Otherwise print an empty code of q---c. And convert the q-c codes into upper case.

I expect the output to have the same number of lines as File1.

Ideally, the output file should have the q-to-c code from File2 then follow by appending the corresponding lines from File1. But at the moment, I have only worked out how to chop the q-to-c codes out of lines that matched in File2 and made it into upper case:

awk 'NR==FNR { a[$1]=1; next } ($2 in a) {print $0} ' File1 File2 | sed -e 's/.*#\(.*\)#.*/\1/' | tr [a-z] [A-Z] > outputFile

... could someone please help? I am new to awk and scripting.

I was going to do a join after getting the above results, but then I risk not joining the correct q-to-c codes to the right lines because my resulting output file does not have as many lines as File1.
I'm open to other solution than awk.

If someone could help, I would really appreciate this. :)
Thanks in advance.

3
  • I usually try to edit question of new comers , but please try to first of all read the tour to know a bit better about stackexchange, Second, edit your question so it's actually unreadable. You can jump line with double space at the end of line and you can quote text and format text. Please show a little effort in the formatting, as is I don't even want to read it.Edit: Lucky you, Kusalanandamake it way better Commented Apr 23, 2018 at 13:29
  • Thanks Kiwy. Yes I realised the formatting was terible, so i was also trying to edit it, I just couldn't do it as fast as you did. Thanks heaps, and please accept my sincere apologies. Commented Apr 23, 2018 at 13:32
  • I know no chats, just questions and answers. But thank you @Kusalananda for editing my original question so quickly. Commented Apr 24, 2018 at 14:42

1 Answer 1

2

With single awk command:

awk 'NR == FNR{
         match($0, /#q[0-9]{3}c#/);
         fruits[$2] = substr($0, RSTART + 1, RLENGTH - 2);
         next
     }
     { print ($1 in fruits? toupper(fruits[$1]) : "Q---C"), $1, $3 }' file2 file1

The output:

Q432C pears fruits
Q123C apple fruit
Q---C carrot veg
Q989C celery vegetable
Q---C oranges fruits
2
  • Thanks RomanPerekhrest. Had to replaced the regexp match from /#q[0-9]{3}c#/ to /#q[0-9][0-9][0-9]c#/ and your answer then worked perfectly. Otherwise, output was missing the QxxxC codes for rows 1, 2 and 4. Thanks heaps. Commented Apr 24, 2018 at 14:34
  • 1
    @achinghead Good! If this solves your issue, please consider accepting the answer. Commented Apr 24, 2018 at 14:46

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.