I have a reasonably large file (~4gb on disk) that I want to access with Python's mmap module to gain some familiarity with memory maps. I have a 64 bit system, and am running something similar to the example below. When I run that, I notice that this process's memory consumption continually increases. I've profiled it with pympler and nothing stands out. Can someone point me to some resources that might describe what's going on under the hood and how to correct this (so I can scan through the file without this "memory leak" consuming all my memory)? Thanks!
import mmap
with open("/path/to/large.file", "r") as j:
mm = mmap.mmap(j.fileno(), 0, access=mmap.ACCESS_READ)
pos = 0
for i in range(mm.size()):
new_pos = mm.find(b"10", pos)
print(new_pos)
pos = new_pos + 1
EDIT The file looks something like this:
0000001, data
0000002, more data
...
...
And with this number of sequential values in the first position there will be a lot of hits for find(b"10")
find(), I instead use a separate catalog of offsets to look up and retrieve data. As I write new entries, the memory usage increases without bound. I'm going to do an experiment where I substitute binary file objects and simplewrite()+seek()+read()operations. It will likely kill the performance of my library, but I'll be able to ascertain whether or not the memory leak is caused by mmap. I'll report back here later.