36

Is there any way to write binary output to sys.stdout in Python 2.x? In Python 3.x, you can just use sys.stdout.buffer (or detach stdout, etc...), but I haven't been able to find any solutions for Python 2.5/2.6.

EDIT: I'm trying to push a PDF file (in binary form) to stdout for serving up on a web server. When I try to write the file using sys.stdout.write, it adds all sorts of carriage returns to the binary stream that causes the PDF to render corrupt.

EDIT 2: For this project, I need to run on a Windows Server, unfortunately, so Linux solutions are out.

Simply Dummy Example (reading from a file on disk, instead of generating on the fly, just so we know that the generation code isn't the issue):

file = open('C:\\test.pdf','rb') 
pdfFile = file.read() 
sys.stdout.write(pdfFile)
7
  • When you did sys.stdout.write() what didn't work? Commented Mar 3, 2010 at 19:48
  • See above for explanation, but the issue is basically that python adds carriage returns when it tries to convert the binary stream to a string for writing. Commented Mar 3, 2010 at 19:54
  • 1
    Does sys.stdout = os.fdopen(1, "wb") work for you to eliminate text-mode conversions? (You'll still need to use sys.stdout.write if you don't want the NLs from print statements.) (docs.python.org/library/os.html#os.fdopen) Commented Mar 3, 2010 at 20:15
  • Thanks for the great question. I learned something new today. Commented Mar 3, 2010 at 20:31
  • @Roger, surprisingly os.fdopen doesn't solve it, although running python with the -u works. -u does bring extra overhead though Commented Mar 3, 2010 at 20:54

5 Answers 5

29

Which platform are you on?

You could try this recipe if you're on Windows (the link suggests it's Windows specific anyway).

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

There are some references on the web that there would/should be a function in Python 3.1 to reopen sys.stdout in binary mode but I don't really know if there's a better alternative then the above for Python 2.x.

Sign up to request clarification or add additional context in comments.

3 Comments

I did a test just reading the PDF in from a file and writing it straight back out, the carriage returns are still added.
The windows solution link you give is the perfect solution. I can't thank you enough; this was driving me absolutely up the wall.
Great! The same works for stdin as well, and both is required to make e.g. a functional cat clone that can handle binary files
10

You can use unbuffered mode: python -u script.py.

-u     Force  stdin,  stdout  and stderr to be totally unbuffered.
       On systems where it matters, also put stdin, stdout and stderr
       in binary mode.

Comments

8

You can use argopen.argopen(), it handles dash as stdin/stdout, and fixes binary mode on Windows.

import argopen
stdout = argopen.argopen('-', 'wb')
stdout.write(some_binary_data)

2 Comments

This is much neater than the ActiveState recipe. How did you figure it out? The module is barely documented.
Didn't work for me -- my distro doesn't have argopen. Didn't want to install it since "msvcrt.setmode()" mentioned above worked for me.
7

In Python 2.x, all strings are binary character arrays by default, so I believe you should be able to just

>>> sys.stdout.write(data)

EDIT: I've confirmed your experience.

I created one file, gen_bytes.py

import sys
for char in range(256):
    sys.stdout.write(chr(char))

And another read_bytes.py

import subprocess
import sys

proc = subprocess.Popen([sys.executable, 'gen_bytes.py'], stdout=subprocess.PIPE)
res = proc.wait()
bytes = proc.stdout.read()
if not len(bytes) == 256:
    print 'Received incorrect number of bytes: {0}'.format(len(bytes))
    raise SystemExit(1)
if not map(ord, bytes) == range(256):
    print 'Received incorrect bytes: {0}'.format(map(ord, bytes))
    raise SystemExit(2)
print "Everything checks out"

Put them in the same directory and run read_bytes.py. Sure enough, it appears as if Python is in fact converting newlines on output. I suspect this only happens on a Windows OS.

> .\read_bytes.py
Received incorrect number of bytes: 257

Following the lead by ChristopheD, and changing gen_bytes to the following corrects the issue.

import sys

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

for char in range(256):
    sys.stdout.write(chr(char))

I include this for completeness. ChristopheD deserves the credit.

6 Comments

This works if you're only trying to add string data, but python tries to stringify binary data when just calling write, corrupting the data.
I ran your gen_bytes.py and read_bytes.py on Mac OS X (Python 2.5 with minor modifications for the missing "format" keywords) and it "Everything checks out"
It looks like it's a Windows-only issue.
On windows, I found that just running gen_bytes.py > bytes.bin I could see that the file was 257 bytes simply by doing a dir
Unless you're using powershell, in which case gen_bytes.py > bytes.bin generates a unicode-encoded file of 522 bytes.
|
0

I solved this using a wrapper for a file-descriptor. (Tested in Python 3.2.5 on Cygwin)

class BinaryFile(object):
    ''' Wraps a file-descriptor to binary read/write. The wrapped
    file can not be closed by an instance of this class, it must
    happen through the original file.

    :param fd: A file-descriptor (integer) or file-object that
        supports the ``fileno()`` method. '''

    def __init__(self, fd):
        super(BinaryFile, self).__init__()
        fp = None
        if not isinstance(fd, int):
            fp = fd
            fd = fp.fileno()
        self.fd = fd
        self.fp = fp

    def fileno(self):
        return self.fd

    def tell(self):
        if self.fp and hasattr(self.fp, 'tell'):
            return self.fp.tell()
        else:
            raise io.UnsupportedOperation(
                'can not tell position from file-descriptor')

    def seek(self, pos, how=os.SEEK_SET):
        try:
            return os.lseek(self.fd, pos, how)
        except OSError as exc:
            raise io.UnsupportedOperation('file-descriptor is not seekable')

    def write(self, data):
        if not isinstance(data, bytes):
            raise TypeError('must be bytes, got %s' % type(data).__name__)
        return os.write(self.fd, data)

    def read(self, length=None):
        if length is not None:
            return os.read(self.fd, length)
        else:
            result = b''
            while True:
                data = self.read(1024)
                if not data:
                    break
                result += data
            return result

1 Comment

The code in this answer doesn't solve the problem in Python 2.7: the \r bytes still appear on standard output on Windows. By adding msvcrt.setmode(self.fd, os.O_BINARY) (as indicated in other answers), the \r bytes disappear.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.