Using Regular Expressions to replace patterns in C#

Question

I'm a little too new to RegEx's so this is mostly asking for help with specific pattern matching and a little with how to implement them in C#.

I have a large Excel file full of, amon other things, repeated addresses that are written in different styles. Most are abbreviations of words like Avenue/etc.

For the simple ones I looked up the string.replace() function:

address.Replace("Av ", "Av. ");

And it does the trick there and for some others; but what if I want to replace the word "Ave" I run into the possibility of it being part of another word (some addresses are in Spanish so this is likely to happen). I thought about including whitespaces before and after (" ave ") but would that work if it's the first word in the string? Or should I use a pattern like (this might be wrong too)

^[0-9a-zA-Z_#' ](Ave)\w //the word is **not** preceded by any character other than a whitespace and is followed by a whitespace

For Expressions such as those, I should use something along this pattern, right?

string replacement = "Av.";
Regex rgx = new Regex( ^[0-9a-zA-Z_#' ](Ave)\w);
string result = rgx.Replace(input, replacement);

Thanks

Bart Enkelaar · Accepted Answer · 2013-12-30 18:01:54Z

3

Regular expressions have a nifty tool for this which is the \b character class shortcut, it matches on word boundaries, so Ave\b would only match Ave followed by either a space or a dot or something else that is not a word character.

Read all about the word boundary class here: http://www.regular-expressions.info/wordboundaries.html

BTW, that site is THE place to go to to learn about regular expressions.

Also, if you were to do it in the way you try, it could be something like this: [^\w]Ave[^\s]

That literally is: Not a word character (a-z, A-Z, 0-9 or _), then Ave, then not a space character (tab, space, linebreak etc.).

Also you could use the shorthand for [^\w] and [^\s] which are \W and \S so it would then become \WAve\S

But the \b way is better.

edited Dec 30, 2013 at 18:01

answered Dec 30, 2013 at 17:50

Bart Enkelaar

7069 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ConnorU Over a year ago

I found that, but it seems to check for ending boundaries. Could I use it for boundaries on the beginning of the word? That is to say to have it check if the word starts with "Ave" and has no preceding characters?

ConnorU Over a year ago

One more question, I'm getting some trailing periods and I try to get rid of them using rgx = new Regex(@"\s\.\b|\b\."); but it doesn't work. What should I do? (eg. trying to get rid of the extra . in "Av. . "

Bart Enkelaar Over a year ago

OK, so if I understand correctly you want to replace all instances of Av, Ave, Avenue, Av., Ave. and Avenue. with Av.? In that case \bAv(e|enue)?\.?+ should cover all the cases, or if you want to go more general to cover spelling errors etc. you could do \bAv\w+\.? (However that would also match Aviation). If that doesn't work exactly and you simply want to get rid of the erroneous dots later, why not replace \s*\.\s*\. with a single .. This will get rid of all "..", " . ." and ". ."'s. (Or if you only have that problem specifically you can replace \.\s\. with a single dot)

addy2601 · Accepted Answer · 2013-12-30 17:57:03Z

1

Add the word delimiter to your regex,

Regex.Match(content, @"\b(Ave)\b");

answered Dec 30, 2013 at 17:57

addy2601

3872 silver badges12 bronze badges

Collectives™ on Stack Overflow

Using Regular Expressions to replace patterns in C#

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related