Python: Detect all strings in binary file?

Question

strings is a GNU/Linux app that prints the strings of printable characters in files.

Is there any way to do what strings does but in Python?

Calling strings and grabbing the output is not an option in my case.

possible duplicate of Python equivalent of unix "strings" utility — mproffitt
– mproffitt, Commented Aug 11, 2015 at 17:14

Jason Hu · Accepted Answer · 2015-08-11 17:16:08Z

2

if you don't care about the content of the output, it's very easy to achieve if you simple ignore all decoding error:

in python2:

with open('file') as fd:
    print fd.read().decode('ascii', errors='ignore')

in python3:

import codecs
with open('file') as fd:
    print(codecs.decode(fd.read(), 'ascii', errors='ignore'))

in any ways, errors='ignore' just ignore all errors during decoding.

further reference: https://docs.python.org/2/library/codecs.html

python3: https://docs.python.org/3.5/library/codecs.html

answered Aug 11, 2015 at 17:16

Jason Hu

6,3631 gold badge22 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Martin Evans · Accepted Answer · 2015-08-11 18:52:01Z

2

The following would print a list of all words of length 4 or more:

import re

with open(r"my_binary_file", "rb") as f_binary:
    print re.findall("([a-zA-Z]{4,})", f_binary.read())

By doing this, it cuts down on some non-text matches but might of course miss something you were looking for. strings also has a default value of 4.

edited Aug 11, 2015 at 18:52

answered Aug 11, 2015 at 17:30

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Comments

drum · Accepted Answer · 2015-08-11 17:10:23Z

1

Check byte by byte to see if it falls between 0x20 and 0x7F. That should print out if the byte is a readable ASCII character.

answered Aug 11, 2015 at 17:10

drum

5,70111 gold badges60 silver badges103 bronze badges

Comments

Cthulhu · Accepted Answer · 2015-08-11 17:41:00Z

0

The following should find all strings of length 4 and more (which is what strings does by default) in the bytes array:

def strings(data):
    cleansed = "".join(map(lambda byte: byte if byte >= chr(0x20) and byte <= chr(0x7F) else chr(0), data))
    return filter(lambda string: len(string) >= 4, cleansed.split(chr(0)))

answered Aug 11, 2015 at 17:41

Cthulhu

1,3711 gold badge13 silver badges26 bronze badges

1 Comment

kimstik Over a year ago

Perhaps it should be "... and byte < chr(0x7F) else ...". I believe that 0x7F(DEL) is not printable.

Collectives™ on Stack Overflow

Python: Detect all strings in binary file?

4 Answers 4

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

1 Comment

Linked

Related