0

Still learning Python, so apologies if this is an extremely obvious mistake. I've been trying to figure it out for hours now though and figured I'd see if anyone can help out.

I've scraped a hockey website for their ice skate name and price and have written it to a CSV. The only problem is that when I write it to CSV the rows for the name column (listed as Gear) and the Price column are not aligned. It goes:

  • Gear Name 1
  • Row Space
  • Price
  • Row Space
  • Gear Name 2

It would be great to align the gear and price rows next to each other. I've attached a link to a picture of the CSV as well if that helps.

import requests
from bs4 import BeautifulSoup as Soup

webpage_response = requests.get('https://www.purehockey.com/c/ice-hockey-skates-senior?')

webpage = (webpage_response.content)
parser = Soup(webpage, 'html.parser')


filename = "gear.csv"
f = open(filename, "w")

headers = "Gear, Price"
f.write(headers)

for gear in parser.find_all("div", {"class": "details"}):
    
    gearname = gear.find_all("div", {"class": "name"}, "a")
    gearnametext = gearname[0].text
    
    gearprice = gear.find_all("div", {"class": "price"}, "a")
    gearpricetext = gearprice[0].text

    print (gearnametext)
    print (gearpricetext)

    f.write(gearnametext + "," + gearpricetext)

[What the uneven rows look like][1] [1]: https://i.sstatic.net/EG2f2.png

2 Answers 2

1

Would recommend with python 3 to use with open(filename, 'w') as f: and strip() your texts before write() to your file.

Unless you do not use 'a' mode to append each line you have to add linebreak to each line you are writing.

Example
import requests
from bs4 import BeautifulSoup as Soup

webpage_response = requests.get('https://www.purehockey.com/c/ice-hockey-skates-senior?')

webpage = (webpage_response.content)
parser = Soup(webpage, 'html.parser')


filename = "gear1.csv"
headers = "Gear,Price\n"


with open(filename, 'w') as f:
    f.write(headers)

    for gear in parser.find_all("div", {"class": "details"}):
        gearnametext = gear.find("div", {"class": "name"}).text.strip()
        gearpricetext = gear.find("div", {"class": "price"}).text.strip()
        f.write(gearnametext + "," + gearpricetext+"\n")
Output
Gear,Price
Bauer Vapor X3.7 Ice Hockey Skates - Senior,$249.99
Bauer X-LP Ice Hockey Skates - Senior,$119.99
Bauer Vapor Hyperlite Ice Hockey Skates - Senior,$999.98 - $1149.98
CCM Jetspeed FT475 Ice Hockey Skates - Senior,$249.99
Bauer X-LP Ice Hockey Skates - Intermediate,$109.99

...

Sign up to request clarification or add additional context in comments.

Comments

1

I've noticed that gearnametext returns 2\n inside the string. You should try the method str.replace() to remove the \n which are creating you the jump to the next line. Try with:

import requests
from bs4 import BeautifulSoup as Soup

webpage_response = requests.get('https://www.purehockey.com/c/ice-hockey-skates-senior?')

webpage = (webpage_response.content)
parser = Soup(webpage, 'html.parser')


filename = "gear.csv"
f = open(filename, "w")

headers = "Gear, Price"
f.write(headers)

for gear in parser.find_all("div", {"class": "details"}):
    
    gearname = gear.find_all("div", {"class": "name"}, "a")
    gearnametext = gearname[0].text.replace('\n','')

    gearprice = gear.find_all("div", {"class": "price"}, "a")
    gearpricetext = gearprice[0].text

    print (gearnametext)
    print (gearpricetext)

    f.write(gearnametext + "," + gearpricetext)

I changed inside the loop the second line for the gear name for: gearnametext = gearname[0].text.replace('\n','').

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.