Writing raw bytes to a file in Python3 results in unexpected output

Question

So, I have this piece of code:

f = open("crash.txt", "w")
junk = ("\xCC" * 1028)
f.write(junk)
f.close()

When I run this on Windows(3.5.1), I get a file with repeated "CC"s as hex characters. That is as expected.

However, running this on Linux(Python 3.4.2), I get repeated "c38c"s as hex characters.

I do not understand the output on Linux. Why does this happen and how do I fix it.

@Reti43 Yes. When I look at the contents of the file in a hex editor. — user1720897
– user1720897, Commented Apr 4, 2016 at 6:35

Mark Tolonen · Accepted Answer · 2016-04-04 07:00:32Z

1

You aren't writing raw bytes. By default Python 3 uses Unicode strings, and those strings must be encoded to write them to a file. Also by default, open() uses text mode and the encoding used to encode text is locale.getpreferredencoding(). On US Windows, that is cp1252, but on Linux, it is usually utf8.

b'\xc3\x8c' is '\xcc' encoded in utf8.

b'\xcc' is '\xcc' encoded in cp1252.

Open the file in binary mode and write byte strings instead of Unicode to write "raw" bytes.

with open("crash.txt", "wb") as f:
    junk = b"\xCC" * 1028
    f.write(junk)

edited Apr 4, 2016 at 7:00

answered Apr 4, 2016 at 6:54

Mark Tolonen

181k26 gold badges182 silver badges278 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user6429297 Over a year ago

@Mark Tolonen I am trying to write a byte type variable into a file. any suggestion?

Collectives™ on Stack Overflow

Writing raw bytes to a file in Python3 results in unexpected output

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related