Remove specific words from a string

Question

I am trying to parse a file of street names for a project, and need to remove modifiers (Upper / Lower /Old / New / North / East / South / West ...) and endings (street / road / way / lane...), but I am hving no luck with a regular expression.

The way it is set up at the moment is that the program will parse the file one line (ie. street) at a time, and check it

I think the problem is word boundries - what I need for example are the following transformations...
Old Harrow Way -> Harrow (ie. remove 'Old' prefix and 'Way' ending)
Chittock Mead -> Chittock (Remove the ending 'Mead')
- But to leave these alone when in a word:
Gold Lane -> Gold (just remove ending)
Eastley Avenue -> Eastly (just remove ending)
Upper Western Avenue -> Western (remove prefix and ending)

Obviously, things like "South Street" would remove both - This is ok, because I can discard an empty string.

Can anyone give me an idea of how to do this - I've been reading up on regular expressions and trying things for hours!

What kind of format is the file? Is it CSV? Tab delimited, or simply no such format at all? Do you have reliable delimiters for the different fields? Is the file fixed space? — Oded
– Oded, Commented Feb 22, 2011 at 21:39
Ah, reminds me of the old adage: You have a problem that you decide to solve with regular expressions. Now you have two problems. :) I'm sorry I don't have a solution for you and I can only add smart-aleck comments. Good luck. — David Hoerster
– David Hoerster, Commented Feb 22, 2011 at 21:40
@Oded, thanks! I never knew that. And he's a Pittsburgh guy, like me. — David Hoerster
– David Hoerster, Commented Feb 22, 2011 at 21:43
Regular expressions are a swine, and i haven't done them for a year so I do agree, 1 problem has become two. :) Although there are alot of good regex people here who I bet can do this in no time. — JonWillis
– JonWillis, Commented Feb 22, 2011 at 21:43

The Muffin Man · Accepted Answer · 2011-02-22 21:39:49Z

2

I would use a <list> or Array to store those values and then possibly a foreach loop to check the address against the list or array. You would then use .remove to remove each instance of the list or array item. There is more to this, but that is the general idea.

answered Feb 22, 2011 at 21:39

The Muffin Man

20k30 gold badges128 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Richard Over a year ago

@Oded - The file is just one per line: abigail close<br /> abingdon road<br /> acorn close<br /> etc

Marc Bernier · Accepted Answer · 2011-02-22 21:51:21Z

2

I'd use string.split(" ") to split the address into and array of words. Then take the first word and see it exists on a list of prefixes (ie a or Array). Do the same for the last word and the endings.

Running through two lists of reg-ex expressions for each input address will be time consuming. Using my logic should be a good deal faster, especially if the lists are sorted and b-searched.

If the address data is a bit dirty (ie, punctuation, double spaces, etc), you may want to do some cleanup, as an input string like " Main St" will have more 'words' than are really there (hint: Trim() and RegEx.Replace(" "," ")).

answered Feb 22, 2011 at 21:51

Marc Bernier

3,00628 silver badges45 bronze badges

3 Comments

Richard Over a year ago

Ok, using the list method you suggested - It works like a dream! One more quickie - How would I match the 'St' at a start of a name (ie. "St. Mary's"), where it could be in the format "St. Marys's", "St Marys", and may or may not have a space after the "St[.]"? Thanks very much for your help.

Richard Over a year ago

Ok, got all the info I needed. Thanks again for all of your help!

Marc Bernier Over a year ago

I usually replace all punctuation with a space before splitting the address.

Community · Accepted Answer · 2017-05-23 12:11:51Z

1

This question or this question will help you. Ensure that you use the Regex.Replace() method to do the pattern matching and replacement.

edited May 23, 2017 at 12:11

CommunityBot

11 silver badge

answered Feb 22, 2011 at 21:43

Bernard

7,9712 gold badges38 silver badges33 bronze badges

Collectives™ on Stack Overflow

Remove specific words from a string

3 Answers 3

1 Comment

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Linked

Related