Fastest way to read a large binary file with Python

Question

I need to read a simple but large (500MB) binary file in Python 3.6. The file was created by a C program, and it contains 64-bit double precision data. I tried using struct.unpack but that's very slow for a large file.

Here is my simple file read:

def ReadBinary():

    fileName = 'C:\\File_Data\\LargeDataFile.bin'

    with open(fileName, mode='rb') as file:
        fileContent = file.read()

Now I have fileContent. What is the fastest way to decode it into 64-bit double-precision floating point, or read it without the need to do a format conversion?

I want to avoid, if possible, reading the file in chunks. I would like to read it decoded, all at once, like C does.

Maybe you can use mmap? See the accepted answer here stackoverflow.com/a/30022899/10035985 — Andrej Kesely
– Andrej Kesely, Commented Jan 31, 2020 at 20:51

ShadowRanger · Accepted Answer · 2020-01-31 20:51:40Z

6

You can use array.array('d')'s fromfile method:

def ReadBinary():
    fileName = r'C:\File_Data\LargeDataFile.bin'

    fileContent = array.array('d')
    with open(fileName, mode='rb') as file:
        fileContent.fromfile(file)
    return fileContent

That's a C-level read as raw machine values. mmap.mmap could also work by creating a memoryview of the mmap object and casting it.

answered Jan 31, 2020 at 20:51

ShadowRanger

158k12 gold badges221 silver badges314 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

RTC222 Over a year ago

I'll try that out now.

RTC222 Over a year ago

I get this message: 'array.array' has no attribute 'array'

RTC222 Over a year ago

That was because I had "from array import array" in my imports; when I changed to "import array" the problem was solved.

ShadowRanger Over a year ago

@RTC222: Yeah, I'm not a fan of the "module and only class in it share the same name" thing. In modern Python, they probably would have named the class Array (matching PEP8 for non-built-ins, like collections.OrderedDict), but we're stuck with legacy names forever, whee!

RTC222 Over a year ago

I don't like that either because it's confusing. I also prefer to import the whole module, not just a class (e.g. from xxx import yyy).

Collectives™ on Stack Overflow

Fastest way to read a large binary file with Python

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related