2

I need a regex that will match several strings in a specific order separated by anything including newlines.

So, if the 3 strings are cat, <dog, </bird> then:

cat abcd abc <dog abc </bird>

matches, but

cat abcd abc </bird> abc <dog

does not.

EDIT: one more example:

catabcd abc <dog abc </bird>

and any such variation where the search terms are not separated by word boundaries should also match.

One final example, it should be greedy in that:

cat abcd
</bird>
<dog
<dog
cat
</bird>

Does NOT match.

I have tried lookahead: (?=.*?cat)(?=.*?dog)(?=.*?bird).* but this does not enforce order (and this particular example only works on one line).

Note: I am using notepad++, but can resort to perl if necessary.

6
  • Do you want to match the entire line containing "cat" and "bird" or just use them as start/endpoints? Commented Sep 8, 2017 at 3:46
  • What about: cat[.\n]+?dog[.\n]+?bird Commented Sep 8, 2017 at 3:47
  • Mateen Ulhaq, Matching either entire line or just start/endpoints is ok, also cat[.\n]+?dog[.\n]+?bird yields no matches on regexr.com Commented Sep 8, 2017 at 12:51
  • Your last example should not be match? Am I right? Commented Sep 8, 2017 at 16:45
  • @k-five, that is correct, last example should not match. Commented Sep 8, 2017 at 16:49

3 Answers 3

1

It may you need something looks like this:

cat(?:(?!bird|cat).)*dog(?:(?!dog|bird).)*bird

It matches only one cat and after this only one dog and then only one bird

with the help of negative look-ahead assertion

Sign up to request clarification or add additional context in comments.

7 Comments

This is getting closer, however there are 2 additional cases that don't work, cat dog cat bird and cat dog dog bird. Perhaps more clearly, I cant have a dog (or multiple dogs) between a cat and a bird, regardless of the existence of other cats between the dog and the bird. Please note, I've pretty much given up on a regex solution for this and am going to code.
What they are? comment here
To clarify my above comment, when I say "I cant have a dog (or multiple dogs) between a cat and a bird...", I mean I want to detect this condition and thus it should be a match.
@GlenYates I am not sure to understand you well. So it should match multiple dogs between only one cat and only one bird? If it is not the case please add any possibilities you have to your question.
Basically yes, but additionally the existence of 1-n cats between the dog and the bird should still result in a match, so we have: cat dog bird, cat dog dog bird, and cat dog cat bird should all match.
|
1

can resort to perl if necessary

Here is the way to do it with Perl.

separated by anything including newlines

In Perl, use the modifier s for . to match anything including newline (this modifier means matching as a single line).

Thus, you can match your input this way: m/.*cat.*dog.*bird.*/s.

This is the source code, its output is matches:

#!/bin/perl -W

$content = " cat abcd
abc dog abc
bird";

print "matches\n" if ($content =~ m/.*cat.*dog.*bird.*/s);

Comments

1

I'm not sure where you found lookaheads, since they are usually more complex to understand than the basic features in regex... which are what I would use for your task given the info you provided:

\bcat\b.*?\bdog\b.*?\bbird\b

Screenshot

Make sure that 'Regular expression' and '. matches newline' are both checked, and that your cursor is at the beginning of the file.

The \b that I used are to ensure that the words you stated match. They ensure that the word is not preceded nor followed by another word character (so that cat will match, but cats will not).

4 Comments

Jerry, thanks for the answer, but please see the edit, in that the search terms might not be bounded by word boundaries. Also, is there a way to get this to work across lines on regexr.com which is where I was testing this?
@GlenYates Ok, then I think you can remove them. I added that description at the end so that you could change it yourself if you didn't need them ^^ And I don't like regexr, it doesn't have all the things I believe it should have. I use regex101.com where I can have the s flag to make . match newlines, and there's a new feature there called unit tests that was added since I used it last time that looks useful! xD regex101.com/r/hpQ8Jl/1/tests
Please see my hopefully last edit, I can't have specifically the last search term in the middle of what would otherwise be a match, i.e. cat </bird> <dog </bird> should NOT match.
@GlenYates Ah, I just saw your comment now. Welp, too late now.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.