10

I want to read the raw binary of a file and put it into a string. Currently I am opening a file with the "rb" flag and printing the byte but it's coming up as ASCII characters (for text that is, for video and audio files it's giving symbols and gibberish). I'd like to get the raw 0's and 1's if possible. This needs to work for audio and video files as well so simply converting the ascii to binary isn't an option.

with open(filePath, "rb") as file:
    byte = file.read(1)
    print byte
4
  • 1
    possible duplicate of stackoverflow.com/questions/1035340/… Commented Nov 15, 2013 at 15:44
  • not really. he's asking more here than the other post can answer. even though it may seem weird what he's asking... Commented Nov 15, 2013 at 16:03
  • 1
    stackoverflow.com/questions/4775146/… Commented Nov 15, 2013 at 16:05
  • 2
    You are reading the binary 0's and 1's from the file into a one character string. Try print bin(ord(byte)). The ord() function returns the integer value of the byte when the argument is a one character 8-bit string. Lastly The bin() function convert integer numbers to a binary string of 0 and 1 characters for printing with a 0b prefix so you'll see something like 0b1100001 printed. Commented Nov 15, 2013 at 16:59

2 Answers 2

10

What you are reading IS really the "raw binary" content of your "binary" file. Strange as it might seems, binary data are not "0's and 1's" but binary words (aka bytes, cf http://en.wikipedia.org/wiki/Byte) which have an integer (base 10) value and can be interpreted as ascii chars. Or as integers (which is how one usually do binary operations). Or as hexadecimal. For what it's worth, "text" is actually "raw binary data" too.

To get a "binary" representation you can have a look here : Convert binary to ASCII and vice versa but that's not going to give you more "raw binary data" than what you actually have...

Now the question: why do you want these data as "0's and 1's" exactly ?

Sign up to request clarification or add additional context in comments.

1 Comment

to be crystal clear: raw_binary_data = open(filename, "rb").read(). It is unrelated to "01"-strings that contain ASCII characters '0', '1' representing the data in binary numeral system (base-2 system is a positional notation with a radix of 2): b'\x0d'[0] == 0x0d == 13 == 0b1101 == int('1101', 2) (b'\x0d'[0] is Python 3 expression, use ord('\x0d') on Python 2) but b'\x0d' != b'1101' (len(b'\x0d') == 1 and len(b'1101') == 4), b'1101' == b'\x31\x31\x30\x31'
9

to get the binary representation I think you will need to import binascii, then:

byte = f.read(1)
binary_string = bin(int(binascii.hexlify(byte), 16))[2:].zfill(8)

or, broken down:

import binascii


filePath = "mysong.mp3"
file = open(filePath, "rb")
with file:
    byte = file.read(1)
    hexadecimal = binascii.hexlify(byte)
    decimal = int(hexadecimal, 16)
    binary = bin(decimal)[2:].zfill(8)
    print("hex: %s, decimal: %s, binary: %s" % (hexadecimal, decimal, binary))

will output:

hex: 64, decimal: 100, binary: 01100100

2 Comments

Note to the OP : please understand the difference between "raw data" and "binary representation".
binascii is not needed here. when working with 1 byte we can use ord() to get an integer ordinal and then convert it with hex() or bin(). But for multibyte values binascii.hexlify() can be handy as it will convert the whole byte string at once.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.