How to parse a formatted string using Python(re)

Question

The string I want to parse is like "{average:12.1km/ltr}". I want to extract 12.1 from this string. The only way I know is using split(":") and split("km/ltr") or so, but these seem not useful. I want to use the scanf-like method to extract 12.1, but in python document, it shows that using regular expression is better than scanf-like function. I though regular express cannot be used in extraction. How should I extract this using re?

"regular express cannot be used in extraction" why not ?

njzk2
– njzk2

2013-02-05 09:53:47 +00:00
Commented Feb 5, 2013 at 9:53 — njzk2
– njzk2, Commented Feb 5, 2013 at 9:53

Abhijit · Accepted Answer · 2013-02-05 10:17:05Z

1

I think you could have simply used the following to extract the numeric portion from the string.

The Trick is, there is one and only one number with a period between.
Period may be optional, as you number may be a whole integer
You may also encounter fractional numbers

Here is the sample

>>> re.findall("\d+\.?\d*|\.\d+",st)
>>> st = "{average:12.1km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12.1']
>>> st = "{average:12km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12']
>>> st = "{average:.5km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['.5']
>>> st = "{average:12.km/ltr}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['12.']
>>> st = " {max:26.9kgm@6100rpm}"
>>> re.findall("\d+\.?\d*|\.\d+",st)
['26.9', '6100']

edited Feb 5, 2013 at 10:17

answered Feb 5, 2013 at 9:47

Abhijit

64k20 gold badges143 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

josh kugler Over a year ago

How about this case: {max:26.9kgm@6100rpm}? How should extract 26.9 and 6100 using a command?

Abhijit Over a year ago

@joshkugler: I have updated my answer with your example and it seems this scenario will also be handled.

Oscar Mederos Over a year ago

@joshkugler you haven't defined what format will those strings have. If we modify our regex to match that, then you can come back later and find another input that won't work. What I mean is: Give some input examples in your question and what numbers are you expecting to extract from each one.

josh kugler Over a year ago

I have to apologize for that, it is not easy to describe it clearly. Actually the format is not what my concern because there are several formats. What I really wanna know is how to extract several sub string from a string. I know how to compose a regular expression pattern (grammer), but I don't know how to do this task with python re API.

josh kugler Over a year ago

re.findall("(\d+\.?\d)kgm@(\d+\.?\d)rpm",str) could be more rubust

Thorsten Kranz · Accepted Answer · 2013-02-05 09:51:59Z

1

Just strip all characters you don't want - no need for regular expressions (though I like them...)

>>> import string
>>> s = "{average:12.1km/ltr}"
>>> s2 = s.strip(string.ascii_letters + "{}:/")
>>> print s2
12.1
>>> number = float(s2)
>>> print number
12.1

answered Feb 5, 2013 at 9:51

Thorsten Kranz

12.8k2 gold badges45 silver badges57 bronze badges

Comments

arjunaskykok · Accepted Answer · 2013-02-05 09:52:15Z

1

Try this, assuming the number could be without dot.

import re
re.findall('[0-9]+(\.[0-9]+)?', str)

answered Feb 5, 2013 at 9:52

arjunaskykok

9661 gold badge11 silver badges17 bronze badges

Comments

Kent · Accepted Answer · 2013-02-05 09:48:46Z

0

how about dirty and quick

re.findall('[\d.]+',s)

this works for your example.

answered Feb 5, 2013 at 9:48

Kent

196k36 gold badges248 silver badges316 bronze badges

Comments

Oscar Mederos · Accepted Answer · 2013-02-05 10:05:59Z

0

You said you tried to split(":") and split("km/ltr"), so I'll suppose that the format of the string is always like :__X__km/ltr, where __X__ is a number.

The following regex will work:

:(\d.+)km

Example:

>>> import re
>>> re.findall(':(\d.+)km', '{average:12.1km/ltr}')
['12.1']
>>>

Then you can just parse as float using the float() function.

answered Feb 5, 2013 at 10:05

Oscar Mederos

30k25 gold badges91 silver badges128 bronze badges

Collectives™ on Stack Overflow

How to parse a formatted string using Python(re)

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Related