With grep:
cat poem.txt \
| grep -Evi -e '\<an\>.*\<the\>' -e '\<the\>.*\<an\>' \
| grep -Eci -e '\<(an|the)\>'
This counts the matched lines. You can find an alternative syntax which counts the total number of matches down below.
Breakdown:
The frist grep command filters out all lines containing both 'an' and 'the'. The second grep command counts those lines, containing either 'an' or 'the'.
If you remove the c from the second grep's -Eci, you will see all matches highlighted.
Details:
The
-Eoption enables extended expression syntax (ERE) for grep.The
-ioption tells grep to match case-insensitiveThe
-voption tells grep to invert the result (i.e. match lines not containing the pattern)The
-coption tells grep to output the number of matched lines instead of the lines themselvesThe patterns:
\<matches the beginning of a word (thanks @glenn-jackman)\>matches the end of a word (thanks @glenn-jackman)
--> That way we can make sure to not match words containing 'the' or 'an' (like 'pan')
grep -Evi -e '\<an\>.*\<the\>'thus matches all lines not containing 'an ... the' ~(Note: I did purposefully not include the case "an the" ('the' directly following on 'an' because it is an unlikely case and I wanted to keep the pattern simple. It could, of course, easily be added)~.- Similarly,
grep -Evi -e '\<the\>.*\<an\>'matches all lines not containing 'the ... an' grep -Evi -e '\<an\>.*\<the\>' -e '\<the.*an\>'is the combination of the 3. and 4.grep -Eci -e '\<(an|the)\>'matches all lines containing either 'an' or 'the' (surrounded by whitespace or start/end of line) and prints the number of matched lines
EDIT 1: Use \< and \> instead of ( |^) and ( |$), as suggested by @glenn-jackman
EDIT 2: In order to count the number of matches instead of the number of matched lines, use the following expression:
cat poem.txt \
| grep -Evi -e '\<an\>.*\<the\>' -e '\<the\>.*\<an\>' \
| grep -Eio -e '\<(an|the)\>' \
| wc -l
This uses the -o option of grep, which prints every match in a separate line (and nothing else) and then wc -l to count the lines.