I have a bit of a complex awk problem that I need to solve.
I am not sure if it could be considered a 2-part problem or there is a way to solve it in one step. t is essentially a 2-part problem.
I have a large directory of files with the same format, each with 266 lines. The first 206 lines of each file are filled with attribute information. Then the following 60 lines consist of 202 values separated by commas. The first position in each of these sixty lines is a word (string value), and the last position in each of these sixty lines is a number (1 or 0). Is it possible to change the last slot ($202) numeric value of lines that contain certain strings that are indicated in a separate file?
To visualize the problem. My data file looks like this:
@RELATION relationData
@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC
@ATTRIBUTE class {1,0}
@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,1
dog,1,2,3,...,201,1
feather,1,2,3,...,201,1
I have a second file with a list of words (1 per line):
cat
feather
I want to change the final numeric value on those lines that contain a word in the second file to 0, so that my file result is:
@RELATION relationData
@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC
@ATTRIBUTE class {1,0}
@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,0
dog,1,2,3,...,201,1
feather,1,2,3,...,201,0
Any suggestions on how to go about solving the problem. For instance, can something like this:
awk -v ip1="$INPUT1" -v ip2="$INPUT2" '{gsub( /String1/, ip1);gsub( /String2/, ip2);print}' file
be modified to solve my problem?
Thanks in advance for any help.