1

Say I have the following string:

BlahBlah........1.000
Whatevah....2.000
Something......6.500

...that is, some text, followed by four or more dots, followed by a number (that may have a dot as a delimiter) followed by a newline (Linux or Windows, I don't know if that's important). It's a part of a larger string.

How do I extract the text and numbers into variables? More precisely an array of value pairs (array of arrays). I just can't get my head around regular expressions yet... :(

1
  • In fact the number doesn't have to be a number, it can be anything followed by a newline. Commented Dec 11, 2011 at 8:39

1 Answer 1

4

use this regex:

(?<word>\w+)\.+(?<number>\d+(\.\d+)?)

with preg_match_all():

preg_match_all("/(?<word>\w+)\.+(?<number>\d+(\.\d+)?)/", $yourString, $theArrayYouWantToStoreMatchesInIt);

To capture anything after 4 dots you can use this:

(?<word>\w+)\.{4,}(?<anything>.*)

The following will also capture strings that have spaces in their first part:

(?<beforeDots>[^\.]+)\.{4,}(?<afterDots>.*)

It's also a good idea to limit the matching text to certain range of characters to make the regex more accurate:

(?<beforeDots>[a-zA-Z0-9 ]+)\.{4,}(?<afterDots>[a-zA-Z0-9\. ]+)
Sign up to request clarification or add additional context in comments.

7 Comments

Hi, here is an update for not capturing the 3rd group (via ?:) and to match also non-demical numbers. (?<word>\w+)\.+(?<number>\d+(?:\.?\d+))?
Can you please modify it to match anything after the 4+ dots including more dots? For example Blahblah.....ok..thistoo so it returns Blahblah and ok..thistoo ? And also, not to match it if there's less than four dots?
Replace \.+ with .{4,} if you want to enforce "followed by four or more dots".
Please help me just with this, what if there is a white space in the first or second part of the line, for example Blah blah blah......1.000 EUR it matches only the last blah and I'd like it to match everything since either the newline or the beginning of the string. I tried replacing \w+ with .* but it matches the dots as well so I need to qualify that as any number of any characters except a series of four or more dots.
Never mind, got it. Is there anything regular expressions can't do??
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.