2

I want to extract the row of a CSV file where column 4 contains a certain number.

The CSV file's rows look like this:

Markus;Haltmeyer;ID;SomeIdentifier

I want to store the first column and second column in different variables each, if SomeIdentifier is fownd.

In the bash script I only have the first characters of SomeIdentifier in a variable firstPartOfID. But nevertheless the correct row is found with the following command:

result=$(awk -v pat="${firstPartOfID}" -F ";" '$0~pat{print $1, $2 }' MyFile.csv)
echo ${result}

Unfortunately result contains both columns. I could try to split $result afterwards, but I want to do it with awk directly.

2 Answers 2

3

You can use read together with process substitution:

read var1 var2 < <(awk -v regexp="${firstPartOfID}" -F ";" '$0~regexp{print $1, $2 }')

I assume that the output does not contain whitespace (except of the delimiter). Otherwise you need to use a different output delimiter in awk and use that also in read:

IFS=";" read var1 var2 < <(awk -v regexp="${firstPartOfID}" 'BEGIN{FS=OFS=";"}$0~regexp{print $1, $2 }')

I'm using the ; as the output delimiter in the above example. It makes sense to use it because it is also the input delimiter and therefore it is guaranteed to be not contained in the data.


Btw, instead of using a regular expression you may use the index() function in awk. That would be more efficient.

awk -v id_prefix="${firstPartOfID}" -F ";" 'index($3, id_prefix){print $1, $2 }'
Sign up to request clarification or add additional context in comments.

4 Comments

I think ; would be a more suitable delimiter, since it is the original delimiter. The columns (as unlikely as it is) may contain #.
I think it doesn't matter. This works great. Thank you.
index() takes 2 args, not 1. Also, I know you're just trying to show the OP how to save the awk output but look at what she said she's trying to do - I want to extract the row of a CSV file where column 4 contains a certain number.. That's not what her awk script did, presumably due to a bug, and so it's not what yours does either. You could fix that for her with 'index($4,pat)==1{... or just $4~("^"pat) { if pat doesn't contain any RE metachars. (and please rename pat to regexp or string - whichever it really is since "patterns" are for quilts and knitting, not software!).
pat changed to regexp. I should know better ;) Use of index() fixed. Thanks!. About the remaining part, Not sure if I'm missing something but the question says I want to store the first column and second column in different variables each. For me it looks like the filter in awk basically worked but the result still had to be split using the shell.
2

You can also do this skipping awk if you want multiple values, and just use bash to do the pattern matching:

while IFS=\; read first last idfield rest; do
    if [[ $idfield =~ $firstPartOfID ]]; then
        first_name=$first
        last_name=$last
        break
    fi
done < MyFile.csv

or depending on what you want to do with those values after, you might be able to do that within awk

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.