2

I need to read a simple but large (500MB) binary file in Python 3.6. The file was created by a C program, and it contains 64-bit double precision data. I tried using struct.unpack but that's very slow for a large file.

Here is my simple file read:

def ReadBinary():

    fileName = 'C:\\File_Data\\LargeDataFile.bin'

    with open(fileName, mode='rb') as file:
        fileContent = file.read()

Now I have fileContent. What is the fastest way to decode it into 64-bit double-precision floating point, or read it without the need to do a format conversion?

I want to avoid, if possible, reading the file in chunks. I would like to read it decoded, all at once, like C does.

1

1 Answer 1

6

You can use array.array('d')'s fromfile method:

def ReadBinary():
    fileName = r'C:\File_Data\LargeDataFile.bin'

    fileContent = array.array('d')
    with open(fileName, mode='rb') as file:
        fileContent.fromfile(file)
    return fileContent

That's a C-level read as raw machine values. mmap.mmap could also work by creating a memoryview of the mmap object and casting it.

Sign up to request clarification or add additional context in comments.

5 Comments

I'll try that out now.
I get this message: 'array.array' has no attribute 'array'
That was because I had "from array import array" in my imports; when I changed to "import array" the problem was solved.
@RTC222: Yeah, I'm not a fan of the "module and only class in it share the same name" thing. In modern Python, they probably would have named the class Array (matching PEP8 for non-built-ins, like collections.OrderedDict), but we're stuck with legacy names forever, whee!
I don't like that either because it's confusing. I also prefer to import the whole module, not just a class (e.g. from xxx import yyy).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.