5

I wanted to create a text file containing a number of ''pages'' and log the byte offset of each page in a separate file. To do that, I printed strings to the main output file and counted bytes using bytes_written += file.write(str). However, the byte offset was often wrong.

I switched to bytes_written += os.write(fd, bytes(str, 'UTF-8')) and it works now. What is the difference between write() and os.write()? Or is the difference in the return value simply due to my manual conversion of the string to UTF-8?

2
  • If you're on Windows, and the file is being written in text mode, then two bytes (cr+lf) will be written for every line ending where the original string only has one. If file.write() doesn't count the bytes properly in this case, I would consider that a bug, but in any case the problem could be rectified by making sure the file is opened in "wb" mode. Commented Jun 28, 2016 at 19:30
  • An other similar point about text files: the value returned by tell is not the byte index in the file, nor the character index. It's just a number that seek can use to return to that position, but you aren't supposed to do much about it. Commented Jun 28, 2016 at 19:53

1 Answer 1

8

What is the difference between write() and os.write()?

It's analogous to the difference between the C functions fwrite(3) and write(2).

The latter is a thin wrapper around an OS-level system call, whereas the former is part of the standard C library, which does some additional buffering, and ultimately calls the latter when it actually needs to write its buffered data to a file descriptor.

Python 3.x adds some additional logic to a file object's write() method which does automatic character-encoding conversion for Python str objects, whereas Python 2.x does not.

Or is the difference in the return value simply due to my manual conversion of the string to UTF-8?

In Python 3.x, the difference is more related to the way in which you opened the file.

If you opened the file in binary mode, e.g. f = open(filename, 'wb') then f.write() expects a bytes object, and will return the number of bytes written.

If, instead, you opened the file in text mode, e.g. f = open(filename, 'w') then f.write() expects a str object, and will return the number of characters written, which for multi-byte encodings such as UTF-8 may not match the number of bytes written.

Note that the os.write() method always expects a bytes object, regardless of whether or not the O_BINARY flag was used when calling os.open().

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.