3

I'm filtering some data with awk (version 20070501, on MacOS) but experienced a syntax challenge when applying a multiple negative match conditions to values in a specific column.

Here's a generic example that I think captures my issue.

Input:

foo,bar
bar,foo
foo,bar
bar,foo

With this code I remove matches for foo in column 2:

awk 'BEGIN { FS=OFS="," } ; { if ($2 !~ /foo/ ) print $0}'

I get this output, which I expected:

foo,bar
foo,bar

Next, I add an additional condition to the if statement, to also remove all values matching bar in column 2:

awk 'BEGIN { FS=OFS="," } ; { if ($2 !~ /foo/ || $2 !~ /bar/) print $0}'

I get this output, which I did not expect:

foo,bar
bar,foo
foo,bar
bar,foo

I expected no rows to be returned, which was my aim. So what's going on?

Are the two conditions are cancelling each other out? I read the GNU awk documentation for boolean expressions, which states:

The ‘&&’ and ‘||’ operators are called short-circuit operators because of the way they work. Evaluation of the full expression is “short-circuited” if the result can be determined partway through its evaluation.

From this snippet, I wasn't sure how to make progress. Or is the issue that the syntax isn't correct? Or both?

Update:

After comments and help from @wiktor-stribiżew here's a better representation of the problem:

1   2   3   4   5
foo bar foo bar FY 2008 Program Totals
foo bar foo bar FY 2009 Program Totals
foo bar foo bar Fiscal Year 2010 Program Totals
foo bar foo bar Fiscal Year 2011 Program Totals
foo bar foo bar Fiscal Year 2012 Program Totals
foo bar foo bar Fiscal Year 2013 Program Totals
foo bar foo bar Fiscal Year 2014 Program Totals
foo bar foo bar Fiscal Year 2015 Program Totals
foo bar foo bar Fiscal Year 2016 Program Totals
foo bar foo bar Fiscal Year 2017 Program Totals

My failing code would be:

awk 'BEGIN { FS=OFS="\t" } ; { if ($5 !~ /Fiscal.*Program Totals/ || $5 !~ /FY.*Program Totals/) print $0}'

The accepted answer below resolves this.

0

2 Answers 2

3

You want to filter out lines where Field 2 matches either foo or bar, so you want that field to be not equal to foo and bar. Thus, you need && operator:

awk -F',' '$2 !~ /foo/ && $2 !~ /bar/' file > newfile
#                      ^^

Note you may also use || if you group the conditions and negate the result:

awk -F\, '!($2 ~ /foo/ || $2 ~ /bar/)' file > newfile

Note you need not set OFS because you are only printing $0 (whole lines) and since it is the default action, you do not need to specify that if you write the condition as shown above.

Sign up to request clarification or add additional context in comments.

4 Comments

Useful explanation, thanks, but it hasn't solved the issue on my production data. It's tab separated, so I'd have to set FS and OFS. The inputs I'm trying to filter out are below, in a specified column ($5 as it happens): ``` FY 2008 Program Totals FY 2009 Program Totals Fiscal Year 2010 Program Totals Fiscal Year 2011 Program Totals ``` awk -F'\t' '$5 !~ /Financial.*Program Totals/ && $5 !~ /FY.*Program Totals/' file doesn't remove them.
@tesolat Then use -F'\t', you still do not need to set the OFS
Sorry, I hit return on my comment accidentally. Edited - there should be a code block in there too, but it didn't render.
@tesolat It works well. Please do not set OFS, it is meaningless in this scenario. Unless you want to change the field separator, of course.
2

All you need is:

awk '$2 !~ /foo|bar/' file

Given your real failing code:

awk 'BEGIN { FS=OFS="\t" } ; { if ($5 !~ /Fiscal.*Program Totals/ || $5 !~ /FY.*Program Totals/) print $0}'

and assuming your fields really are tab-separated as your code implies, you'd write that as just:

awk -F'\t' '$5 !~ /F(iscal|Y).*Program Totals/'

3 Comments

Good solution @ed-morton, thanks. I already chose Wiktor's answer as it addresses the use of conditions and logical operators, which I need to make more use of.
You're welcome. That's fine, it's just not how you'd actually solve the problem described in your question.
The regex you use is one way to solve the problem, but I should have been more clear that I would need to extend the use of conditions and operators, and that these would need to be a component of the solution.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.