I am parsing a text log, where each line contains an id closed in parenthesis and one or more (possibly hundreds) chunks of data (alphanumeric, always 20 chars), such as this:
id=(702831), data1=(Ub9fS97Hkc570Vvqkdy1), data2=(Hd7t553df8mnOa84wTcF)
id=(702832), data1=(Ba6FGoP5Dzxwmb6JhJ5a)
At this point of the program, I am not interested about the data, just about quick fetching of all the ids. The problem is, that due to the noisy communication channel an error may appear denoted by string Error that can be anywhere on the line. The goal is to ignore these lines.
What worked for me so far was a simple negative lookahead:
^id=\((\d+)\),(?!.*Error)
But I forgot, that there is some tiny probability, that this Error string may actually appear as a valid sequence of characters somewhere in the data, which has backfired on me just now.
The only way to distinguish between valid and invalid appearance of the Error string in the data chunk is to check for the length. If it's 20 characters, then it was this rare valid occurrence and I want to keep it (unless the Error is elsewhere on the line), if it's longer, I want to discard the line.
Is it still possible to treat this situation with a regular expression or is it already too much for the regex monster?
Thanks a lot.
Edit: Adding examples of error lines - all these should be ignored.
iErrord=(702831), data1=(Ub9fS97Hkc570Vvqkdy1), data2=(Hd7t553df8mnOa84wTcF)
id=(7028Error32), data1=(Ba6FGoP5Dzxwmb6JhJ5a)
id=(702833), daErrorta1=(hF6eDpLxbnFS5PfKaCds)
id=(702834), data1=(bx5EsH7BCsk6dMzpQDErrorKA)
However this one should not be ignored, the Error is just incidently contained in the data part, but it currently is ignored
id=(702834), data1=(bx5EsH6dMzpQDErrorKA)
Errorin them and then only allow the ones having validid=([0-9]+)in them?Error.