I am making a program, which should be able to encode any type of file using huffman algorithm. It all works, but using it on large files is too slow (at least I think it is). When I tried to open an 120MB mp4 file to unpack it, it took me about 210s just to read the file. Not to mention that it took a large chunk of memory to do so. I thought unpacking using struct would be efficient, but it isnt. Isn't there more effiecent way to do it in python? I need to read any file by bytes and then pass it to the huffman method in string.
if __name__ == "__main__":
start = time.time()
with open('D:\mov.mp4', 'rb') as f:
dataL = f.read()
data = np.zeros(len(dataL), 'uint8')
for i in range(0, len(dataL)):
data[i] = struct.unpack('B', dataL[i])[0]
data.tostring()
end = time.time()
print("Original file read: ")
print end - start
encoded, table = huffman_encode(data)
f.read(). My system is a little bit more performant. I see the bottleneck being the HDD (for your read op). There isn't/shouldn't be too much CPU work. Also I had 4+ GB of RAM free, so that when loading the file it's contents should be kept in memory instead to cache it in the swap file, which would make it slower.