The string I want to parse is like "{average:12.1km/ltr}". I want to extract 12.1 from this string. The only way I know is using split(":") and split("km/ltr") or so, but these seem not useful. I want to use the scanf-like method to extract 12.1, but in python document, it shows that using regular expression is better than scanf-like function. I though regular express cannot be used in extraction. How should I extract this using re?
5 Answers
I think you could have simply used the following to extract the numeric portion from the string.
- The Trick is, there is one and only one number with a period between.
- Period may be optional, as you number may be a whole integer
- You may also encounter fractional numbers
Here is the sample
>>> re.findall("\d+\.?\d*|\.\d+",st)
>>> st = "{average:12.1km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12.1']
>>> st = "{average:12km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12']
>>> st = "{average:.5km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['.5']
>>> st = "{average:12.km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12.']
>>> st = " {max:26.9kgm@6100rpm}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['26.9', '6100']
5 Comments
josh kugler
How about this case: {max:26.9kgm@6100rpm}? How should extract 26.9 and 6100 using a command?
Abhijit
@joshkugler: I have updated my answer with your example and it seems this scenario will also be handled.
Oscar Mederos
@joshkugler you haven't defined what format will those strings have. If we modify our regex to match that, then you can come back later and find another input that won't work. What I mean is: Give some input examples in your question and what numbers are you expecting to extract from each one.
josh kugler
I have to apologize for that, it is not easy to describe it clearly. Actually the format is not what my concern because there are several formats. What I really wanna know is how to extract several sub string from a string. I know how to compose a regular expression pattern (grammer), but I don't know how to do this task with python re API.
josh kugler
re.findall("(\d+\.?\d)kgm@(\d+\.?\d)rpm",str) could be more rubust
You said you tried to split(":") and split("km/ltr"), so I'll suppose that the format of the string is always like :__X__km/ltr, where __X__ is a number.
The following regex will work:
:(\d.+)km
Example:
>>> import re
>>> re.findall(':(\d.+)km', '{average:12.1km/ltr}')
['12.1']
>>>
Then you can just parse as float using the float() function.