Printing to stdout with encoding in Python 3 [duplicate]

Question

I have a Python 3 program that reads some strings from a Windows-1252 encoded file:

with open(file, 'r', encoding="cp1252") as file_with_strings:
    # save some strings

Which I later want to write to stdout. I've tried to do:

print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)

print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'

sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface

print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte

print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'<my string>' instead of just the string

I'm scratching my head here. I'd like to print the string I got from the file just as it appears there, in cp1252. (In my terminal, when I do more $file, these characters appear as question marks, so my terminal is probably ascii.)

Would love some clarification! Thanks!

What does string_to_print = some_string.decode('utf-8'); print(string_to_print) do? — hd1
– hd1, Commented Mar 3, 2016 at 3:30
It's just a str, so I get AttributeError: 'str' object has no attribute 'decode' — mostsquares
– mostsquares, Commented Mar 3, 2016 at 3:34
"(In my terminal, when I do more $file, these characters appear as question marks, so my terminal is probably ascii.)" <- no, seeing as though in your answer you're writing cp1252, then your terminal encoding probably doesn't match your locale. — Alastair McCormack
– Alastair McCormack, Commented Mar 7, 2016 at 19:02
I'm voting to close this question as off-topic because the actual problem is too localised - it's caused by an incorrectly configured environment and/or by usage but is not properly described. — Alastair McCormack
– Alastair McCormack, Commented Mar 8, 2016 at 17:34

Craig McQueen · Accepted Answer · 2023-12-31 03:17:01Z

10

Since Python 3.7, you can change the encoding of all text written to sys.stdout with the reconfigure method:

import sys

sys.stdout.reconfigure(encoding="cp1252")

That could be helpful if you need to change the encoding for all output from your program.

edited Dec 31, 2023 at 3:17

Craig McQueen

43.7k32 gold badges138 silver badges188 bronze badges

answered Oct 4, 2019 at 0:27

Trey Hunner

11.9k4 gold badges58 silver badges124 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mostsquares · Accepted Answer · 2016-03-03 04:04:40Z

2

To anybody out there with the same problem, I ended up doing:

to_print = (some_string + "\n").encode("cp1252")
sys.stdout.buffer.write(to_print)
sys.stdout.flush() # I write a ton of these strings, and segfaulted without flushing

edited Mar 3, 2016 at 4:04

answered Mar 3, 2016 at 3:50

mostsquares

9388 silver badges33 bronze badges

Comments

Ani Menon · Accepted Answer · 2016-06-05 15:10:22Z

1

When you encode with cp1252, you have to decode with the same.

Eg:

import sys
txt = ("hi hello\n").encode("cp1252")
#print((txt).decode("cp1252"))
sys.stdout.buffer.write(txt)
sys.stdout.flush()

This will print "hi hello\n" (which was encoded in cp1252) after decoding it.

edited Jun 5, 2016 at 15:10

answered Mar 3, 2016 at 4:17

Ani Menon

28.4k17 gold badges111 silver badges128 bronze badges

4 Comments

Mark Ransom Over a year ago

Printing after decode just tries to print a Unicode string, which leads you right back where you started. Your example only works because it only contains ASCII characters.

Ani Menon Over a year ago

Yeah, agreed. Buffer writer has to be used.

Ivan Kvolik Over a year ago

This helped me a lot. I was reading from a STDIN, and writing to a file worked, as you can set the encoding in open(), but printing was a nightmare.

resurrected user Over a year ago

If different codecs are used for encoding and decoding (e.g. print(txt.encode("utf-8").decode("cp1252")) ) the result is not identical and may be printable. The translation errors can actually be helpful to find the offending characters.

Alastair McCormack · Accepted Answer · 2016-03-07 09:17:15Z

0

You're either piping to your script or your locale is broken. You should fix your environment, rather than fixing your script to your environment, as this will make your script very brittle.

If you're piping, Python assumes the output should be "ASCII" and sets the encoding of stdout to "ASCII".

Under normal conditions, Python uses the locale to work out what encoding to apply to stdout. If your locale is broken (Not installed or corrupt), Python will default to "ASCII". A locale of "C", will also give you an encoding of "ASCII".

Check your locale by typing locale and ensure no errors are returned. E.g.

$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=

If all else fails or you're piping, you can override Python's locale detection by setting the PYTHONIOENCODING environment variable. E.g.

$ PYTHONIOENCODING=utf-8 ./my_python.sh

Remember that your shell has a locale and your terminal has an encoding - they both need to be set correctly

answered Mar 7, 2016 at 9:17

Alastair McCormack

28k8 gold badges81 silver badges106 bronze badges

5 Comments

mostsquares Over a year ago

Not piping, but it's also not my environment - it's a program that I have to run on school servers, which have ascii terminals. I could change my personal environment or use a different terminal, but I can't guarantee that the graders will.

mostsquares Over a year ago

It's Debian, I'm handing in a .py file that will be run with python3 by someone on a different computer, but reading from the same files, and always trying to write to ascii stdout

Alastair McCormack Over a year ago

If your terminals really are ASCII (they probably aren't), why is your answer encoding to "cp1252"?

mostsquares Over a year ago

I have to encode to cp1252 to maintain the accent marks that were in the original data. This script's output will be redirected to a file, and I want that file to have those accent marks. My locale has nothing set for LANG/LANGUAGE or ALL, and everything else is "POSIX", fwiw

Alastair McCormack Over a year ago

1) Your terminals are not ASCII if they're displaying cp1252. 2) Your environment is not setup correctly if you don't have a LANG defined. That is why "more" is failing. You may find your students have correctly configured environments or a different encoding setup, meaning your brittle code will break

Gino Mempin · Accepted Answer · 2024-07-03 23:10:13Z

0

This is not working

plt.savefig(sys.stdout.buffer)

Use this instead of buffer

plt.savefig(sys.stdout.encoding)

edited Jul 3, 2024 at 23:10

Gino Mempin

30.4k31 gold badges125 silver badges174 bronze badges

answered Jul 3, 2024 at 7:31

Samet Sarıyıldız

314 bronze badges

Collectives™ on Stack Overflow

Printing to stdout with encoding in Python 3 [duplicate]

5 Answers 5

Comments

Comments

4 Comments

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

4 Comments

5 Comments

Comments

Linked

Related