0

Why am I getting this issue? and how do I resolve it?

UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 24: unexpected code byte

Thank you

4
  • 1
    Can you explain what you are trying to do? Commented Jul 6, 2010 at 18:00
  • 2
    Please give more information and post some code. Commented Jul 6, 2010 at 18:00
  • 2
    See: groups.google.com/group/pylons-discuss/browse_thread/thread/… Commented Jul 6, 2010 at 18:01
  • More information indeed, especially if it's Python 3.x or 2.x Commented Jul 6, 2010 at 18:03

3 Answers 3

1

Somewhere, perhaps subtly, you are asking Python to turn a stream of bytes into a "string" of characters.

Don't think of a string as "bytes". A string is a list of numbers, each number having an agreed meaning in Unicode. (#65 = Latin Capital A. #19968 = Chinese Character "One"/"First") .

There are many methods of encoding a list of Unicode entities into a stream of bytes. Python is assuming your stream of bytes is the result of a particular such method, called "UTF-8".

However, your stream of bytes has data that does not correspond to that method. Thus the error is raised.

You need to figure out the encoding of the stream of bytes, and tell Python that encoding.

It's important to know if you're using Python 2 or 3, and the code leading up to this exception to see where your bytes came from and what the appropriate way to deal with them is.

If it's from reading a file, you can explicity deal with the bytes read. But you must be sure of the file encoding.

If it's from a string that is part of your source code, then Python is assuming the "wrong thing" about your source files... perhaps $LC_ALL or $LANG needs to be set. This is a good time to firmly understand the concept of encoding, and how text editors choose an encoding to write, and what is standard for your language and operating system.

Sign up to request clarification or add additional context in comments.

Comments

0

In addition to what Joe said, chardet is a useful tool to detect encoding of the source data.

Comments

0

Somewhere you have a plain string encoded as "Windows-1252" (or "cp1252") containing a "RIGHT SINGLE QUOTATION MARK" (’) instead of an APOSTROPHE ('). This could come from a file you read, or even in a Python source file of yours; you could be running Python 2.x and have a # -*- coding: utf8 -*- line somewhere near the script's beginning, or you could be running Python 3.x.

You don't give enough data; however, somewhere you have a cp1252-encoded string, which you try (explicitly or implicitly) to decode to unicode as utf-8. This won't work.

Give us more info, and we'll try again to help you.

Joe Koberg's answer reminded me of an older answer of mine, which some people have found helpful: Python UnicodeDecodeError - Am I misunderstanding encode?

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.