4

I know that | is the logical "OR" operator inside a RegExp expression. But what is the equivalent "AND" operator (again, inside a RegExp)?

Note:

  • This is not about the multiple expressions' operator of "AND", which is just &&.
  • For example, something like /A&B/ to match both A and B.
4
  • Related: grep with logic operators Commented Feb 3, 2023 at 12:36
  • Also: How to run grep with multiple AND patterns? Commented Feb 3, 2023 at 12:52
  • 1
    The correct regexp for A&B is A.*B|B.*A - that is A followed by B or B followed by A which is exactly the same as your A&B which is A followed by B or B followed by A. Commented Feb 4, 2023 at 9:56
  • Yes, I think this is a perfect Q&A post, so that should be all right! Commented Feb 5, 2023 at 10:50

2 Answers 2

11

There is no such operator in any of the regular expression flavors I am familiar with. If you want to match inputs that have both A and B you can write A.*B or B.*A, both of which require them in that particular order; or combine both expressions to accept either order with A.*B|B.*A.

Alternatively, do two separate matches. For example, in awk:

awk '/A/ && /B/' file

or manually with two grep instances:

grep A file | grep B

You don't really need an AND operator in regular expressions. The idea of a regex is that it describes a string. By definition, you put in the regex the thing you are trying to match. So an OR is needed to allow matching either A or B, but the AND is basically built in to the regular expression: anything you write in a regex needs to be matched so everything is basically joined by AND operators making a dedicated AND kind of pointless.

3
  • Sometimes that "and" might be useful, though. E.g. you could have input where the lines contain lists of strings, and you want to find the ones where the list has both foo and bar and just need the quick solution instead of parsing properly. Especially if you want the filenames and line numbers too, piping greps would have the second match on those, and with AWK you'd need more work to print them in the first place. Commented Feb 3, 2023 at 12:06
  • 1
    With Perl regexes, you could do grep -P '(?=.*A)(?=.*B)' or something like that, not that that's very pretty either. Commented Feb 3, 2023 at 12:08
  • 2
    Of course, A.*B|B.*A won't work the same as && if there's possible overlap between A and B - for example, if you're searching for lines containing both "alpha" and "beta" then according to that specification you should match the string "betalpha". Commented Feb 3, 2023 at 20:20
3

Note: As comments by Stéphane Chazelas suggest, this answer is somewhat invalidated by the existence of RegEx implementations that do allow an AND-Operator. The reasoning below is still correct in that such an operator only makes sense if you ensure that the imposed conditions are mutually compatible.


I think the answer is that there cannot be the "AND" equivalent of the |-operator in RegExes, because in the end, regular expressions perform matching on the character level of the input string (albeit sometimes implicitly via repetition operators), and thereby directly tied to a particular position in the string (see e.g. this Q&A for a similar discussion).

The point is that if you have an expression of the form (I'm using explicitly awk syntax here because of your question title)

$0 ~ /something(A|B)somethingelse/

this requires the string to have either A or B at the specific position immediately behind something and before somethingelse to match. The position requirement can be more dynamic if you have patterns with repetition operators, such as

$0 ~ /[a-f]+(A|B)[0-9]+/

but still, the point is that the occurence of either A or B is tied specifically to the position after the pattern consisting of only lowercase a ... f(1) and before the pattern consisting of only digits 0 ... 9.

There cannot be a corresponding "AND" condition

$0 ~ /something(A&B)somethingelse/

because that would mean that the input string would have to contain A as well as B at the very same position - which obviously wouldn't work.

The only use case where an "AND" operator is useful is therefore in describing general properties of the string, where each of the desired properties can be expressed by a single RegEx, e.g. "the string must contain at least one A and at least one B regardless of their exact absolute and relative position", but that would again leave us at the && operator for combining multiple expressions, which you said you are not interested it, and of course the various alternative formulations of this workaround in @terdon's answer.


(1) in C collating order, at least

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.