5

I'd like to get the exact sequence of bits from a file into a string using Python 3. There are several questions on this topic which come close, but don't quite answer it. So far, I have this:

>>> data = open('file.bin', 'rb').read()
>>> data
'\xa1\xa7\xda4\x86G\xa0!e\xab7M\xce\xd4\xf9\x0e\x99\xce\xe94Y3\x1d\xb7\xa3d\xf9\x92\xd9\xa8\xca\x05\x0f$\xb3\xcd*\xbfT\xbb\x8d\x801\xfanX\x1e\xb4^\xa7l\xe3=\xaf\x89\x86\xaf\x0e8\xeeL\xcd|*5\xf16\xe4\xf6a\xf5\xc4\xf5\xb0\xfc;\xf3\xb5\xb3/\x9a5\xee+\xc5^\xf5\xfe\xaf]\xf7.X\x81\xf3\x14\xe9\x9fK\xf6d\xefK\x8e\xff\x00\x9a>\xe7\xea\xc8\x1b\xc1\x8c\xff\x00D>\xb8\xff\x00\x9c9...'

>>> bin(data[:][0])
'0b11111111'

OK, I can get a base-2 number, but I don't understand why data[:][x], and I still have the leading 0b. It would also seem that I have to loop through the whole string and do some casting and parsing to get the correct output. Is there a simpler way to just get the sequence of 01's without looping, parsing, and concatenating strings?

Thanks in advance!

5
  • 3
    reading a file opened in binary mode produces bytes object, not string object. Are you sure you're using py3k? Commented Jan 23, 2011 at 18:03
  • Yes, I'm sure I'm using py3k. They probably are byte objects, but the terminal is displaying them with single quotes. Commented Jan 23, 2011 at 18:09
  • 1
    Single or double quotes are not relevant, but the representation of bytes objects start with a b. Like so b'\xa1\xa7\xda4\x86G...', which you missed above. Commented Jan 23, 2011 at 19:34
  • Ah, I see. I must've copy/pasted wrong. Ooops. Commented Jan 25, 2011 at 14:37
  • related: Convert Binary to ASCII and vice versa (Python) Commented Nov 16, 2013 at 5:51

4 Answers 4

6

I would first precompute the string representation for all values 0..255

bytetable = [("00000000"+bin(x)[2:])[-8:] for x in range(256)]

or, if you prefer bits in LSB to MSB order

bytetable = [("00000000"+bin(x)[2:])[-1:-9:-1] for x in range(256)]

then the whole file in binary can be obtained with

binrep = "".join(bytetable[x] for x in open("file", "rb").read())
Sign up to request clarification or add additional context in comments.

5 Comments

Nice solution, but some remarks: 1. Python 3 does not have xrange() (and this is a Python 3 quesition). 2. You arrange the bits in some kind of "big endian" order, which is very unnatural to me. At least it should be pointed out. 3. It is generally considered an error to have a variable with the same name as a built-in class (bytes).
Now I like it even more, +1 :)
Thanks for the reply, but...I'm getting the error: TypeError: ord() expected string of length 1, but int found
Another point, ord(x) won't work with a bytes object (as a file in binary mode is read). Iterating over bytes produces a series of integers, so replace [ord(x)] with [x].
@Thomas K: Thanks, fixed and also took the time to actually test it
3

If you are OK using an external module, this uses bitstring:

>>> import bitstring
>>> bitstring.BitArray(filename='file.bin').bin
'110000101010000111000010101001111100...'

and that's it. It just makes the binary string representation of the whole file.

1 Comment

Beware, with Python 3.6.4 the output didn't contain a "0b", so you cut the two first bits.
2

It is not quite clear what the sequence of bits is meant to be. I think it would be most natural to start at byte 0 with bit 0, but it actually depends on what you want.

So here is some code to access the sequence of bits starting with bit 0 in byte 0:

def bits_from_char(c):
    i = ord(c)
    for dummy in range(8):
        yield i & 1
        i >>= 1

def bits_from_data(data):
    for c in data:
        for bit in bits_from_char(c):
            yield bit

for bit in bits_from_data(data):
    #  process bit

(Another note: you would not need data[:][0] in your code. Simply data[0] would do the trick, but without copying the whole string first.)

Comments

1

To convert raw binary data such as b'\xa1\xa7\xda4\x86' into a bitstring that represents the data as a number in binary system (base-2) in Python 3:

>>> data = open('file.bin', 'rb').read()
>>> bin(int.from_bytes(data, 'big'))[2:]
'1010000110100111110110100011010010000110...'

See Convert binary to ASCII and vice versa.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.