Specific regex to detect error string

Question

I am parsing a text log, where each line contains an id closed in parenthesis and one or more (possibly hundreds) chunks of data (alphanumeric, always 20 chars), such as this:

id=(702831), data1=(Ub9fS97Hkc570Vvqkdy1), data2=(Hd7t553df8mnOa84wTcF)
id=(702832), data1=(Ba6FGoP5Dzxwmb6JhJ5a)

At this point of the program, I am not interested about the data, just about quick fetching of all the ids. The problem is, that due to the noisy communication channel an error may appear denoted by string Error that can be anywhere on the line. The goal is to ignore these lines.

What worked for me so far was a simple negative lookahead:

^id=\((\d+)\),(?!.*Error)

But I forgot, that there is some tiny probability, that this Error string may actually appear as a valid sequence of characters somewhere in the data, which has backfired on me just now.

The only way to distinguish between valid and invalid appearance of the Error string in the data chunk is to check for the length. If it's 20 characters, then it was this rare valid occurrence and I want to keep it (unless the Error is elsewhere on the line), if it's longer, I want to discard the line.

Is it still possible to treat this situation with a regular expression or is it already too much for the regex monster?

Thanks a lot.

Edit: Adding examples of error lines - all these should be ignored.

iErrord=(702831), data1=(Ub9fS97Hkc570Vvqkdy1), data2=(Hd7t553df8mnOa84wTcF)
id=(7028Error32), data1=(Ba6FGoP5Dzxwmb6JhJ5a)
id=(702833), daErrorta1=(hF6eDpLxbnFS5PfKaCds)
id=(702834), data1=(bx5EsH7BCsk6dMzpQDErrorKA)

However this one should not be ignored, the Error is just incidently contained in the data part, but it currently is ignored

id=(702834), data1=(bx5EsH6dMzpQDErrorKA)

Is there a reason you need this in one regex? Can't you just first ignore all lines that have Error in them and then only allow the ones having valid id=([0-9]+) in them? — csl
– csl, Commented Jan 23, 2014 at 15:23
No, it can be in two. But still you face the same problem imho - that you cannot simply ignore all lines containing Error. — Erlik
– Erlik, Commented Jan 23, 2014 at 15:27

Theox · Accepted Answer · 2014-01-23 15:36:37Z

1

Alright, it's not exactly what you were thinking about, but here's a suggestion :

Can't you simply match the lines following the pattern, undisturbed by an Error somewhere ?

Here's the regexp that'll do it :

^id=\((\d+)\), (data\d+=\([a-zA-Z\d]{20}\)(, )?)+$

If Error is anywhere on the line (except in the middle of the chunk of data), the regexp will not match it, so you get the wanted result, it'll be ignored.

If this doesn't please you, you have to add more lookahead and lookbehind groups. I'll try to do that and edit if I write a good regexp.

edited Jan 23, 2014 at 15:36

answered Jan 23, 2014 at 15:28

Theox

1,3639 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Erlik Over a year ago

No, no, this works perfectly! Just when the Error is at the end of the line, it still is accepted, but that can be easily fixed by adding $ at the end of the regex. I was also thinking about somehow checking the length, but was unable to do it for any amount of chunks and yet it's this simple. Thanks!

Theox Over a year ago

You are right about the end of the line ! I added the $ in my answer.

Robin · Accepted Answer · 2014-01-23 15:50:59Z

1

Since your chunks of data are always 20 characters long, if one is 25 characters this means there is an error in it. Therefore you could check if there is a chunk of such a length, then check if there is Error outside of parenthesis. If so, you shouldn't match the line. If not, it valid.

Something like

(?![^)]*Error)id=\((\d+)(?!.*(?:\(.{25}\)|\)[^(]*Error))

might do the trick.

answered Jan 23, 2014 at 15:50

Robin

9,6963 gold badges38 silver badges45 bronze badges

4 Comments

Robin Over a year ago

Jsut saw Theox's answer, probably way more elegant if the pattern of your string is fixed indeed :)

Erlik Over a year ago

Still I appreciate it, thanks! Actually I was trying to construct something like that (forgetting I can make things simpler, such as is shown by Theox), but my regex skills are way too small for that. So if I got it well, you are ensuring, that after the id=(123) part there is NOT a sequence of 25 chars closed in () and there is NOT an Error string outside the parenthesis. Correct?

Robin Over a year ago

Yep, ensuring that there no chunk of 25 characters OR an Error outside parameters. But as usual in regex, half the work is asking oneself the right question!

Erlik Over a year ago

Nice and straightforward. Thanks!

Collectives™ on Stack Overflow

Specific regex to detect error string

2 Answers 2

2 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Related