Return to Revisions

2 of 7

added 1 character in body

edited Feb 3, 2021 at 14:14

With grep:

cat poem.txt \
  | grep -Evi -e '(^| )an .* the($| )' -e '(^| )the .* an($| )' \
  | grep -Eci -e '(^| )(an|the)($| )'

Breakdown:

The frist grep command filters out all lines containing both 'an' and 'the'. The second grep command counts those lines, containing either 'an' or 'the'.

Details:

The -E option enables extended expression syntax (ERE) for grep.
The -i option tells grep to match case-insensitive
The -v option tells grep to invert the result (i.e. match lines not containing the pattern)
The -c option tells grep to output the number of matched lines instead of the lines themselves
The patterns:
1. (^| ) matches either the beginning of the line or a space character
2. ($| ) matches either the end of the line or a space character
--> That way we can make sure to not match words containing 'the' or 'an' (like 'pan')
1. grep -Evi -e '(^| )an .* the($| )' thus matches all lines not containing 'an ... the' (Note: I did purposefully not include the case "an the" ('the' directly following on 'an' because it is an unlikely case and I wanted to keep the pattern simple. It could, of course, easily be added).
2. Similarly, grep -Evi -e '(^| )the .* an($| )' matches all lines not containing 'the ... an'
3. grep -Evi -e '(^| )an .* the($| )' -e '(^| )the .* an($| )' is the combination of the 3. and 4.
4. grep -Eci -e '(^| )(an|the)($| )' matches all lines containing either 'an' or 'the' (surrounded by whitespace or start/end of line) and prints the number of matched lines

answered Feb 3, 2021 at 14:05

theCalcaholic