0

I am new to Regex. My input is:

2233    0 0     20180405    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

This line is allowed to be constructed only with: tab, number, float, endofline/newline.

I read line content in C#:

using (var sourceStream = new StreamReader(sourceFilePath))
{
    string iteratedLine;
    while ((iteratedLine = sourceStream.ReadLine()) != null)
    //iteratedLine = 2233\t0 0\t\t20180405\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0

Then i send iteratedLine to validate function.
I only allow the following expressions to be in the string:
1. tab
2. new line/end of line
3. number
4. float (0.123)

The following validation function does not work, What am I missing ?

bool isValid = Regex.IsMatch(inputLine, @"(\d+\.{1}\d*)|(\d)|(\\t)|(\\n)|(\\r)");

If i take the regex (\d+.{1}\d*)|(\d)|(\t)|(\n)|(\r) and use in regex101.com its suppose to fail line that has other character then these 4 restrictions.

Thanks 1

5
  • Can the data in the input be in any order? Commented Jul 29, 2018 at 15:29
  • Yes. As long seperated by tab.. basically i read a file that each line contains tab seperated numbers. I want to validate the line does not have spaces or wierd chars.. Commented Jul 29, 2018 at 15:37
  • So the input you gave should fail because there are not only tabs as delimiters, right? Commented Jul 29, 2018 at 15:42
  • Yes thats true. Commented Jul 29, 2018 at 15:50
  • Can you please try ^(?:(?:\d+\.\d+|\d+)(?: {4}|$)\n?)+$ ? Commented Jul 29, 2018 at 16:05

2 Answers 2

2

You use isMatch which finds a match in a specified input string. In your regex you use alternations which will find a match for for example one or more digits. If your string also contains unwanted characters, the alternation would still match one or more digits and not the unwanted characters resulting and isMatch will still return true

Test

You could use an anchor ^ assert the start of the line, match one or more digits \d+ followed by an optional part (?:\.\d+)? that matches a dot and one or more digits.

Then match a tab \t followed by more digits followed by an optional part that matches a dot and one or more digits and assert the end of the line $

Repeat the second part one or more times so that there are at least 2 values separated by a tab.

^\d+(?:\.\d+)?(?:\t\d+(?:\.\d+)?)+$

Demo

Sign up to request clarification or add additional context in comments.

Comments

2

You are missing a few points:

  1. You need to anchor both ends of the Regex to both ends of the string, so it needs ^ at the start and $ at the end. Without that, it can return true if any part of the line matches; but we only want it to return true if the whole line matches the pattern.
  2. StreamReader's ReadLine strips off the end of line, so you don't need to worry about that.
  3. You need to enforce that between the tabs are values, otherwise a line of just tabs would pass.

This should do the trick...

^\d+(?:\.\d+)?(?:\t\d+(?:\.\d+)?)+$

If you are expecting a particular number of values on each row, you could replace the final + with {x,x} where x is the number of items minus one.

An alternative approach would be to use string.Split and use Linq to check that all the items return true from double.TryParse.

2 Comments

Thanks i will try tomorrow. Empty value may be between tabs. I saw it in past. I dont want to enforce it., i recieve input from external component... have no control..
If the values can be missing, you would need to tweak the expression I've given, changing both occurrences of \d+(?:\.\d+)? to be inside brackets and followed by a question mark.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.