I need logical AND in regex.
something like
jack AND james
agree with following strings
'hi jack here is james'
'hi james here is jack'
I need logical AND in regex.
something like
jack AND james
agree with following strings
'hi jack here is james'
'hi james here is jack'
You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info:
Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions...lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not.
It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.
So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack and james in either order:
^(?=.*\bjack\b)(?=.*\bjames\b).*$
The expressions in parentheses starting with ?= are the positive lookaheads. I'll break down the pattern:
^ asserts the start of the expression to be matched.(?=.*\bjack\b) is the first positive lookahead saying that what follows must match .*\bjack\b..* means any character zero or more times.\b means any word boundary (white space, start of expression, end of expression, etc.).jack is literally those four characters in a row (the same for james in the next positive lookahead).$ asserts the end of the expression to me matched.So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james and another word boundary." After the two lookaheads is .* which simply matches any characters zero or more times and $ which matches the end of the expression.
"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack, but that is not necessary to satisfy the second lookahead) then the word james. Neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".
I think you get the idea, but just to be absolutely clear, here is with jack and james reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james, but that is not necessary to satisfy the second lookahead) then the word jack. As before, neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".
This approach has the advantage that you can easily specify multiple conditions.
^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$
vim syntax: ^\(.*\<jack\>\)\@=\(.*\<james\>\@=\).*$ or \v^(.*<jack>)@=(.*<james>)@=.*$^(?=.*\b#friday\b)(?=.*\b#tgif\b).*$ fails to match blah #tgif blah #friday blah but ^(?=.*\bfriday\b)(?=.*\btgif\b).*$ works fine.$ symbol from the pattern or remove the new line character from the test string, other languages (Python, PHP) on this website work perfectly. Also you can remove .*$ from the end — regexp still will be matches the test string, but it's without selecting of the whole test string as match.(?i) can also make it case insensitive. ^(?i)(?=.*\bjack\b)(?=.*\bjames\b).*$Try:
james.*jack
If you want both at the same time, then or them:
james.*jack|jack.*james
james.*?jack|jack.*?james. This will help on large texts.Explanation of command that i am going to write:-
. means any character, digit can come in place of .
* means zero or more occurrences of thing written just previous to it.
| means 'or'.
So,
james.*jack
would search james , then any number of character until jack comes.
Since you want either jack.*james or james.*jack
Hence Command:
jack.*james|james.*jack
Its short and sweet
(?=.*jack)(?=.*james)
[
"xxx james xxx jack xxx",
"jack xxx james ",
"jack xxx jam ",
" jam and jack",
"jack",
"james",
]
.forEach(s => console.log(/(?=.*james)(?=.*jack)/.test(s)) )
element (?=.*jack) result will be element, for (?=.*jack) there will be no result . Olso tried on example string here: regex101.comThe expression in this answer does that for one jack and one james in any order.
Here, we'd explore other scenarios.
jack and One jamesJust in case, two jack or two james would not be allowed, only one jack and one james would be valid, we can likely design an expression similar to:
^(?!.*\bjack\b.*\bjack\b)(?!.*\bjames\b.*\bjames\b)(?=.*\bjames\b)(?=.*\bjack\b).*$
Here, we would exclude those instances using these statements:
(?!.*\bjack\b.*\bjack\b)
and,
(?!.*\bjames\b.*\bjames\b)
We can also simplify that to:
^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$
If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.
jex.im visualizes regular expressions:
const regex = /^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$/gm;
const str = `hi jack here is james
hi james here is jack
hi james jack here is jack james
hi jack james here is james jack
hi jack jack here is jack james
hi james james here is james jack
hi jack jack jack here is james
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
jack and One james in a specific orderThe expression can be also designed for first a james then a jack, similar to the following one:
^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b.*\bjack\b).*$
and vice versa:
^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjack\b.*\bjames\b).*$
No need for two lookaheads, one substring can be normally matched.
^(?=.*?\bjack\b).*?\bjames\b.*
Lookarounds are zero-length assertions (conditions). The lookahead here checks at ^ start if jack occurs later in the string and on success matches up to james and .* the rest (could be removed). Lazy dot is used before words (enclosed in \b word boundaries). Use the i-flag for ignoring case.
.* after last \b or will that work without it also?.* in the end is indeed useless, it's just needed if the full match is wanted.You can make use of regex's quantifier feature since lookaround may not be supported all the time.
(\bjames\b){1,}.*(\bjack\b){1,}|(\bjack\b){1,}.*(\bjames\b){1,}
\b(word1|word2|word3|word4|etc)\b I've tested it here: rubular.com/r/Pgn2d6dXXXHoh7Vim has a branch operator \& that is useful when searching for a line containing a set of words, in any order. Moreover, extending the set of required words is trivial.
For example,
/.*jack\&.*james
will match a line containing jack and james, in any order.
Likewise,
/\<\(\w*jack\&\w*james\)
will match a variable name containing jack and james, in any order.
See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.
All of the answers so far work for finding a match, but they don't all work for highlighting that match. For example: if you want to use grep's "--only-matching" or "--color" options, then there's only one kind of answer (so far) that will work: james.*jack|jack.*james. I'll call this the permutations technique, and the other ones the lookaround technique and the vim branch technique.
The lookaround technique won't do any highlighting at all, because it will always match a zero-length string, because - by definition - that's what lookarounds do. That is, for this input text:
hi jack here is james
hi james here is jack
a (perl) regex of (?=.*jack)(?=.*james) won't highlight anything. You can test this by running this command in most any unix shell:
printf 'hi jack here is james\nhi james here is jack\n' | grep --color --perl '(?=.*jack)(?=.*james)'
Some of the answers here add .* to the beginning and the end. That will highlight something - the whole line - but that doesn't help if our goal is to highlight the words we're looking for, and what's in between those words, and nothing more.
The vim branch technique (AKA \&) will highlight something that might look useful at a glance, but it's probably not what you want. For the same input text,
a vim search for /.*james\&.*jack will highlight hi jack and hi james here is jack.
To test this from the shell, run this:
printf 'hi jack here is james\nhi james here is jack\n' | vim -R - '+/.*james\&.*jack'
Only the permutations technique will highlight the most useful things: jack here is james and james here is jack. To test this from the shell:
printf 'hi jack here is james\nhi james here is jack\n' | grep --color --perl 'james.*jack|jack.*james'
All of what I've written here assumes that you want a technique that will generalize to three or more words.
I looked at the other solutions and thought they were a little unnecessarily long, complex, and wordy. Basically you want a regex that will match
"firstword ANYTHING secondword"
And that same regex will match
"secondword ANYTHING firstword"
So I did a slight variation on this solution which worked for me
jack.*james|james.*jack
Only instead of using using one regular expression, I ran two and did an OR on the results
#!/usr/bin/perl -w
#match two words in any order
my @testStrings;
push(@testStrings, "firstword secondword");
push(@testStrings, "secondword firstword");
push(@testStrings, "firstword some filler in the middle secondword");
push(@testStrings, "secondword match either word coming first firstword");
push(@testStrings, "filler in beginning firstword some filler in the middle secondword filler in the end");
push(@testStrings, "filler in beginning secondword some filler in the middle firstword filler in the end");
push(@testStrings, "doh is it matching anything?");
push(@testStrings, "firstword alone");
push(@testStrings, "secondword alone");
for (@testStrings){
my $matched = $_ =~ /firstword.*secondword/;
my $matchedReverse = $_ =~ /secondword.*firstword/;
#/(?:firstword.*secondword)|(?:secondword.*firstword)/ as a single regex also works
print "string: $_\n";
if($matched || $matchedReverse){
print "regex: Matched\n";
} else{
print "regex: Did not match\n";
}
print "\n";
}
Output looks like this
perl forwardAndBackwardRegex.pl
string: firstword secondword
regex: Matched
string: secondword firstword
regex: Matched
string: firstword some filler in the middle secondword
regex: Matched
string: secondword match either word coming first firstword
regex: Matched
string: filler in beginning firstword some filler in the middle secondword filler in the end
regex: Matched
string: filler in beginning secondword some filler in the middle firstword filler in the end
regex: Matched
string: doh is it matching anything?
regex: Did not match
string: firstword alone
regex: Did not match
string: secondword alone
regex: Did not match
I do this same sort of technique in bash when I am searching for a filename with two words in any order. I will just run two greps. Something like this
ls | grep -i firstword | grep -i secondword
It will match as long as both words are present.