I have a list of URLs in a column in a CSV-file. I would like to use Python to go through all the URLs, download a specific part of the HTML code from the URL and save it to the next column.
For example: From this URL I would like to extract this div and write it to the next column.
<div class="info-holder" id="product_bullets_section">
<p>
VM−2N ist ein Hochleistungs−Verteilverstärker für Composite− oder SDI−Videosignale und unsymmetrisches Stereo−Audio. Das Eingangssignal wird entkoppelt und isoliert, anschließend wird das Signal an zwei identische Ausgänge verteilt.
<span id="decora_msg_container" class="visible-sm-block visible-md-block visible-xs-block visible-lg-block"></span>
</p>
<ul>
<li>
<span>Hohe Bandbreite — 400 MHz (–3 dB).</span>
</li>
<li>
<span>Desktop–Grösse — Kompakte Bauform, zwei Geräte können mithilfe des optionalen Rackadapters RK–1 in einem 19 Zoll Rack auf 1 HE nebeneinander montiert werden.</span>
</li>
</ul>
</div>
I have this code, the HTML code is saved in the variable html:
import csv
import urllib.request
with open("urls.csv", "r", newline="", encoding="cp1252") as f_input:
csv_reader = csv.reader(f_input, delimiter=";", quotechar="|")
header = next(csv_reader)
items = [row[0] for row in csv_reader]
with open("results.csv", "w", newline="") as f_output:
csv_writer = csv.writer(f_output, delimiter=";")
for item in items:
html = urllib.request.urlopen(item).read()
Currently the HTML-Code is pretty ugly. How could I delete everything out of the variable html except the div I would like to extract?