Python regex extract number from string

Question

I would like to extract a number from a large html file with python. My idea was to use regex like this:

import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    found = ''

found

But unfortunately i'm not used to regex and i fail to adapt this example to extract 0,54125 from:

(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

Is there an other way to extract the number or could some one help me with the regex?

Extract the contents of the tag you need with BeautifulSoup and then just split the string and get Item #0. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 27, 2018 at 9:15
Do not use regex for HTML parsing: there are enough tools more suitable for this purpose, e.g. BeautifulSoup, lxml.html... — Andersson
– Andersson, Commented Apr 27, 2018 at 9:23

Thm Lee · Accepted Answer · 2018-04-27 14:44:30Z

1

If you want output 0,54125(or \d+,\d+), then you need to set some conditions for the output.

From the following input,

 (...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

If you want to extract 0,54125, it seems you can try several regexs like follows,

(?<=\>)\d+,\d+

Demo

or,

(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+

Demo

, etc..

edited Apr 27, 2018 at 14:44

answered Apr 27, 2018 at 14:31

Thm Lee

1,2361 gold badge9 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chen A. · Accepted Answer · 2018-04-27 09:32:10Z

You can replace some characters in your text before searching it. For example, to capture numbers like 12,34 you can do this:

text = 'gfgfdAAA12,34ZZZuijjk'
try:
    text = text.replace(',', '')
    found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
    found = ''

print found
# 1234

If you need to capture the digits inside a line, you can make your pattern more general, like this:

text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)

print found
# 054125

Collectives™ on Stack Overflow

Python regex extract number from string

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related