0

I am trying to read a binary file using the following format

with open("binaryfile.bin" , 'rb') as f1:  
    for line in f1.readlines():  
        print(line) 

It is returning gibberish data like

@ ç─+@@@d*d)      

⌡Å2

  
q_ Ç        

I have verified that the data in the file is correct and I can read it using the od command on the command line

od -w8 -Ad -x binaryfile.bin

Output:

0000000 0011 0022 0066 0066
0000008 0066 0066 0066 0066
*  
0000032 1234 0000 0000 0000
0000040 0000 0000 0000 0000
* 
0000080 0000 0000 0000 0056
0000088 0011 

The problem with the 'od' command is that when two or more consecutive lines are similar then it replaces them with "*\n". This issue becomes more prevalent if I read only two bytes per line as more data is common.

od -w2 -Ad -x binaryfile.bin

Output:

0000000 0011
0000002 0022
0000004 0066
*
0000032 1234
0000034 0000
*
0000086 0056
0000088 0011

I want to read each and every line.

Q1: Can anyone suggest why is the regular 'rb' command not working?
Q2: Is there an option to read the complete file using the 'ob' command without removing the common lines?

4
  • Are you using Python 2? I would expect Python 3 to throw an error with that code. Commented Oct 2, 2020 at 1:34
  • 3
    od has a -v flag to prevent removal of duplicate lines. Commented Oct 2, 2020 at 1:56
  • Yes, I am using Python 2. Commented Oct 2, 2020 at 3:12
  • @Jasonharper. Thank you. -v did the trick. Commented Oct 2, 2020 at 3:13

1 Answer 1

3

open("binaryfile.bin" , 'rb') works correctly, it reads data in bytes, then you print this information to console and it tries to convert these bytes chunk to 'utf-8' format and produces weird characters since you're reading not a text file.

You could use binascii.hexify method to convert bytes string to the hex representation you want:

import binascii

with open("binaryfile.bin" , 'rb') as f1:  
  for line in f1.readlines():
    # NOTE: arguments 'sep' and 'bytes_per_sep' are only since Python v3.8
    print(binascii.hexlify(line, sep=' ', bytes_per_sep=2))
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @GProst. I can print the whole file in a single string with print(binascii.hexlify(line). The print line that you specified is giving this error "hexlify() takes no keyword arguments". I googled but couldn't find examples, where hexlify takes additional arguments.
Oh, it started to accept those args only starting Python 3.8. (You can see it in the hexify method docs)
Hmm. I am using Python 2.
Python 2 isn't converting those bytes to UTF-8, it's just passing them to the console unmodified. The console may be interpreting them as UTF-8 or it may be using some other encoding.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.