Search and replace with 2 separate files using AWK

Question

I have a bit of a complex awk problem that I need to solve.

I am not sure if it could be considered a 2-part problem or there is a way to solve it in one step. t is essentially a 2-part problem.

I have a large directory of files with the same format, each with 266 lines. The first 206 lines of each file are filled with attribute information. Then the following 60 lines consist of 202 values separated by commas. The first position in each of these sixty lines is a word (string value), and the last position in each of these sixty lines is a number (1 or 0). Is it possible to change the last slot ($202) numeric value of lines that contain certain strings that are indicated in a separate file?

To visualize the problem. My data file looks like this:

@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,1
dog,1,2,3,...,201,1
feather,1,2,3,...,201,1

I have a second file with a list of words (1 per line):

cat
feather

I want to change the final numeric value on those lines that contain a word in the second file to 0, so that my file result is:

@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat,1,2,3,...,201,0
dog,1,2,3,...,201,1
feather,1,2,3,...,201,0

Any suggestions on how to go about solving the problem. For instance, can something like this:

awk -v ip1="$INPUT1" -v ip2="$INPUT2" '{gsub( /String1/, ip1);gsub( /String2/, ip2);print}' file

be modified to solve my problem?

Thanks in advance for any help.

anubhava · Accepted Answer · 2014-10-16 10:32:25Z

2

This awk should work:

awk -F',' 'FNR==NR {a[$1];next} $1 in a{$NF=0} 1' list.txt file.txt
@RELATION relationData

@ATTRIBUTE att0 STRING
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
....
@ATTRIBUTE att200 NUMERIC

@ATTRIBUTE class {1,0}

@DATA
hall,1,2,3,...,201,0
cat 1 2 3 ... 201 0
dog,1,2,3,...,201,1
feather 1 2 3 ... 201 0

answered Oct 16, 2014 at 10:32

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

choroba · Accepted Answer · 2014-10-16 10:40:21Z

2

Perl to the rescue:

#!/usr/bin/perl
use warnings;
use strict;

open my $LIST, '<', 'list-of-words' or die $!;
chomp(my @lines = <$LIST>);
my $regex = join '|', @lines;
# or, if the "words" can contain special characters:
# my $regex = join '|', map "\Q$_\E", @lines;
$regex = qr/^($regex),/;

open my $DATA, '<', 'data-file' or die $!;
while (<$DATA>) {
    if (/\@DATA/ .. undef) {
        s/,[0-9]+$/,0/ if /$regex/;
    }
    print;
}

edited Oct 16, 2014 at 10:40

answered Oct 16, 2014 at 10:35

choroba

245k27 gold badges221 silver badges304 bronze badges

Collectives™ on Stack Overflow

Search and replace with 2 separate files using AWK

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related