0

Currently I am writing a script in Python 2.7 that works fine except for after running it for a few seconds it runs into an error:

Enter Shopify website URL (without HTTP):  store.highsnobiety.com
Scraping! Check log file @ z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Traceback (most recent call last):
  File "shopify_sitemap_scraper.py", line 38, in <module>
    print(prod, variants).encode('utf-8')
AttributeError: 'NoneType' object has no attribute 'encode'

The script is to get data from a Shopify website and then print it to console. Code here:

# -*- coding: utf-8 -*-
from __future__ import print_function
from lxml.html import fromstring
import requests
import time
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

# Log file location, change "z://shopify_output.txt" to your location.
logFileLocation = "z:\shopify_output.txt"

log = open(logFileLocation, "w")

# URL of Shopify website from user input (for testing, just use store.highsnobiety.com during input)
url = 'http://' + raw_input("Enter Shopify website URL (without HTTP):  ") + '/sitemap_products_1.xml'

print ('Scraping! Check log file @ ' + logFileLocation + ' to see output.')
print ("!!! Also make sure to clear file every hour or so !!!")
while True :

    page = requests.get(url)
    tree = fromstring(page.content)

    # skip first url tag with no image:title
    url_tags =  tree.xpath("//url[position() > 1]")

    data = [(e.xpath("./image/title//text()")[0],e.xpath("./loc/text()")[0]) for e in  url_tags]

    for prod, url in data:
    # add xml extension to url
        page = requests.get(url + ".xml")
        tree = fromstring(page.content)
        variants = tree.xpath("//variants[@type='array']//id[@type='integer']//text()")
        print(prod, variants).encode('utf-8')

The most crazy part about it is that when I take out the .encode('utf-8') it gives me a UnicodeEncodeError seen here:

Enter Shopify website URL (without HTTP):  store.highsnobiety.com
Scraping! Check log file @ z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Copper Bracelet - 5mm - Brushed ['3726247811']
Copper Bracelet - 7mm - Polished ['3726253635']
Highsnobiety x EARLY - Leather Pouch ['14541472963', '14541473027', '14541473091']
Traceback (most recent call last):
  File "shopify_sitemap_scraper.py", line 38, in <module>
    print(prod, variants)
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xae' in position 13: character maps to <undefined>'

Any ideas? Have no idea what else to try after hours of googling.

2 Answers 2

1

snakecharmerb almost got it, but missed the cause of your first error. Your code

print(prod, variants).encode('utf-8')

means you print the values of the prod and variants variables, then try to run the encode() function on the output of print. Unfortunately, print() (as a function in Python 2 and always in Python 3) returns None. To fix it, use the following instead:

print(prod.encode("utf-8"), variants)
Sign up to request clarification or add additional context in comments.

2 Comments

Still getting the "AttributeError: 'list' object has no attribute 'encode'" with the new code
@DanielYveson sorry, I didn't realize that variants was a list. See my edited answer above.
1

Your console has a default encoding of cp437, and cp437 is unable to represent the character u'\xae'.

>>> print (u'\xae')
®
>>> print (u'\xae'.encode('utf-8'))
b'\xc2\xae'
>>> print (u'\xae'.encode('cp437'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/encodings/cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 0: character maps to <undefined>

You can see that it's trying to convert to cp437 in the traceback: File "C:\Python27\lib\encodings\cp437.py", line 12, in encode

(I reproduced the problem in Python3.5, but it's the same issue in both versions of Python)

1 Comment

See @MattDMo's answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.