63

I'm fairly new to using regular expressions, and, based on a few tutorials I've read, I'm unable to get this step in my Regex.Replace formatted properly.

Here's the scenario I'm working on... When I pull my data from the listbox, I want to format it into a CSV like format, and then save the file. Is using the Replace option an ideal solution for this scenario?

Before the regular expression formatting example.

FirstName LastName Salary    Position
-------------------------------------
John      Smith    $100,000.00  M

Proposed format after regular expression replace

John Smith,100000,M

Current formatting status output:

John,Smith,100000,M

*Note - is there a way I can replace the first comma with a whitespace?

Snippet of my code

using(var fs = new FileStream(filepath, FileMode.OpenOrCreate, FileAccess.Write))
{
    using(var sw = new StreamWriter(fs))
    {
        foreach (string stw in listBox1.Items)
        {
            StringBuilder sb = new StringBuilder();
            sb.AppendLine(stw);

            //Piecing the list back to the original format
            sb_trim = Regex.Replace(stw, @"[$,]", "");
            sb_trim = Regex.Replace(sb_trim, @"[.][0-9]+", "");
            sb_trim = Regex.Replace(sb_trim, @"\s", ",");
            sw.WriteLine(sb_trim);
        }
    }
}
6
  • With your regex 44.66 would be replaced as 44 Commented Apr 20, 2013 at 5:49
  • Just Re-edited my statement... I'm converting 100,000.00 to 100000. Yeah I'm leaving cents out of this equation when I'm writing back to the csv format. Commented Apr 20, 2013 at 5:49
  • Please try not to use so much bold fonts - I've removed all and also fixed your incomplete Dispose calls so code looks ok. Commented Apr 20, 2013 at 5:52
  • @Curtis you should use [.]0+(?=\s) then Commented Apr 20, 2013 at 5:56
  • @Anirudh Wouldn't \.\d+ be better? Maybe it's not always going to be .00. Commented Apr 20, 2013 at 6:08

5 Answers 5

89

You can do it this with two replace's

//let stw be "John Smith $100,000.00 M"

sb_trim = Regex.Replace(stw, @"\s+\$|\s+(?=\w+$)", ",");
//sb_trim becomes "John Smith,100,000.00,M"

sb_trim = Regex.Replace(sb_trim, @"(?<=\d),(?=\d)|[.]0+(?=,)", "");
//sb_trim becomes "John Smith,100000,M"

sw.WriteLine(sb_trim);
Sign up to request clarification or add additional context in comments.

4 Comments

This really does a lot of unnecessary work, and is probably not great for performance. If you're going to do that, at least set a timeout.
@Anirudh I understand what he wanted to do. I have a one-line answer below, though I'm not sure that it works yet.
@Zenexer, Whenever one uses regular expressions, performance is impacted - whether it matters or not is entirely situational.
@Moo-Juice Certainly true, which is why I prefer to avoid them. When they are used, it's a good idea to precompile them.
23

Try this::

sb_trim = Regex.Replace(stw, @"(\D+)\s+\$([\d,]+)\.\d+\s+(.)",
    m => string.Format(
        "{0},{1},{2}",
        m.Groups[1].Value,
        m.Groups[2].Value.Replace(",", string.Empty),
        m.Groups[3].Value));

This is about as clean an answer as you'll get, at least with regexes.

  • (\D+): First capture group. One or more non-digit characters.
  • \s+\$: One or more spacing characters, then a literal dollar sign ($).
  • ([\d,]+): Second capture group. One or more digits and/or commas.
  • \.\d+: Decimal point, then at least one digit.
  • \s+: One or more spacing characters.
  • (.): Third capture group. Any non-line-breaking character.

The second capture group additionally needs to have its commas stripped. You could do this with another regex, but it's really unnecessary and bad for performance. This is why we need to use a lambda expression and string format to piece together the replacement. If it weren't for that, we could just use this as the replacement, in place of the lambda expression:

"$1,$2,$3"

6 Comments

Thanks, yeah I attempted to group my regular expressions, however I seemed to complicate my scenario more than anything, so I reverted back to the basics. I'll give this a shot as well.
There's probably some way to avoid the commas in the group, but it escapes me. There are certainly people here who are more familiar with .NET-specific regexes, so perhaps they'll know.
@Anirudh Not according to MSDN.
it is non-capturing group but you are still capturing it in another group i.e it would still be captured in group2...
Ah. What about using (?<=)?
|
4

Add the following 2 lines

var regex = new Regex(Regex.Escape(","));
sb_trim = regex.Replace(sb_trim, " ", 1);

If sb_trim= John,Smith,100000,M the above code will return "John Smith,100000,M"

Comments

3

For simplicity, you just need a number from currency.

Regex.Replace(yourcurrency, "[^0-9]","")

Comments

1

This must do the job:

var result=Regex.Replace("John      Smith    $100,000.00  M", @"^(\w+)\s+(\w+)\s+\$([\d,\.]+)\s+(\w+)$","$1,$2,$3,$4");

//result: "John,Smith,100,000.00,M"

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.