I have a very large big-endian binary file. I know how many numbers in this file. I found a solution how to read big-endian file using struct and it works perfect if file is small:
data = []
file = open('some_file.dat', 'rb')
for i in range(0, numcount)
data.append(struct.unpack('>f', file.read(4))[0])
But this code works very slow if file size is more than ~100 mb. My current file has size 1.5gb and contains 399.513.600 float numbers. The above code works with this file an about 8 minutes.
I found another solution, that works faster:
datafile = open('some_file.dat', 'rb').read()
f_len = ">" + "f" * numcount #numcount = 399513600
numbers = struct.unpack(f_len, datafile)
This code runs in about ~1.5 minute, but this is too slow for me. Earlier I wrote the same functional code in Fortran and it run in about 10 seconds.
In Fortran I open the file with flag "big-endian" and I can simply read file in REAL array without any conversion, but in python I have to read file as a string and convert every 4 bites in float using struct. Is it possible to make the program run faster?
struct; reading a file of ~1GB at once (your second example) totally maxes out the memory on my laptop (8GB), which then of course makes everything very slow. Reading it in chunks was the solution in my case.