0

The string I want to parse is like "{average:12.1km/ltr}". I want to extract 12.1 from this string. The only way I know is using split(":") and split("km/ltr") or so, but these seem not useful. I want to use the scanf-like method to extract 12.1, but in python document, it shows that using regular expression is better than scanf-like function. I though regular express cannot be used in extraction. How should I extract this using re?

1
  • "regular express cannot be used in extraction" why not ? Commented Feb 5, 2013 at 9:53

5 Answers 5

1

I think you could have simply used the following to extract the numeric portion from the string.

  • The Trick is, there is one and only one number with a period between.
  • Period may be optional, as you number may be a whole integer
  • You may also encounter fractional numbers

Here is the sample

>>> re.findall("\d+\.?\d*|\.\d+",st)
>>> st = "{average:12.1km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12.1']
>>> st = "{average:12km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12']
>>> st = "{average:.5km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['.5']
>>> st = "{average:12.km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12.']
>>> st = " {max:26.9kgm@6100rpm}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['26.9', '6100']
Sign up to request clarification or add additional context in comments.

5 Comments

How about this case: {max:26.9kgm@6100rpm}? How should extract 26.9 and 6100 using a command?
@joshkugler: I have updated my answer with your example and it seems this scenario will also be handled.
@joshkugler you haven't defined what format will those strings have. If we modify our regex to match that, then you can come back later and find another input that won't work. What I mean is: Give some input examples in your question and what numbers are you expecting to extract from each one.
I have to apologize for that, it is not easy to describe it clearly. Actually the format is not what my concern because there are several formats. What I really wanna know is how to extract several sub string from a string. I know how to compose a regular expression pattern (grammer), but I don't know how to do this task with python re API.
re.findall("(\d+\.?\d)kgm@(\d+\.?\d)rpm",str) could be more rubust
1

Just strip all characters you don't want - no need for regular expressions (though I like them...)

>>> import string
>>> s = "{average:12.1km/ltr}"
>>> s2 = s.strip(string.ascii_letters + "{}:/")
>>> print s2
12.1
>>> number = float(s2)
>>> print number
12.1

Comments

1

Try this, assuming the number could be without dot.

import re
re.findall('[0-9]+(\.[0-9]+)?', str)

Comments

0

how about dirty and quick

re.findall('[\d.]+',s)

this works for your example.

Comments

0

You said you tried to split(":") and split("km/ltr"), so I'll suppose that the format of the string is always like :__X__km/ltr, where __X__ is a number.

The following regex will work:

:(\d.+)km

Example:

>>> import re
>>> re.findall(':(\d.+)km', '{average:12.1km/ltr}')
['12.1']
>>>

Then you can just parse as float using the float() function.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.