Python monotonically increasing memory usage (leak?)

Question

I'm using this simple code and observing monotonically increasing memory usage. I'm using this little module to dump stuff to disk. I observed it happens with unicode strings and not with integers, is there something I'm doing wrong?

When I do:

>>> from utils.diskfifo import DiskFifo
>>> df=DiskFifo()
>>> for i in xrange(1000000000):
...     df.append(i)

Memory consumption is stable

but when I do:

>>> while True:
...     a={'key': u'value', 'key2': u'value2'}
...     df.append(a)

It goes to the roof. Any hints? below the module...

import tempfile
import cPickle

class DiskFifo:
    def __init__(self):
        self.fd = tempfile.TemporaryFile()
        self.wpos = 0
        self.rpos = 0
        self.pickler = cPickle.Pickler(self.fd)
        self.unpickler = cPickle.Unpickler(self.fd)
        self.size = 0

    def __len__(self):
        return self.size

    def extend(self, sequence):
        map(self.append, sequence)

    def append(self, x):
        self.fd.seek(self.wpos)
        self.pickler.dump(x)
        self.wpos = self.fd.tell()
        self.size = self.size + 1

    def next(self):
        try:
            self.fd.seek(self.rpos)
            x = self.unpickler.load()
            self.rpos = self.fd.tell()
            return x

        except EOFError:
            raise StopIteration

    def __iter__(self):
        self.rpos = 0
        return self

How are you measuring memory consumption? Are you aware that Python rarely (almost never) returns memory to the OS? — S.Lott
– S.Lott, Commented Jul 28, 2011 at 10:02
@S.Lott sort of, but then it should stabilize at some point right? one thing is not returning and the other is leaking... — piotr
– piotr, Commented Jul 28, 2011 at 10:05
@piotr: a 'leak' is when the memory is still claimed but is inaccessible to the application. If python can still use the memory but hasn't decided to free it, say it's lying stale in a cache somewhere, then it isn't a leak. — user23743
– user23743, Commented Jul 28, 2011 at 10:08
When you do for i in xrange('1000000000') you'll get a TypeError. — interjay
– interjay, Commented Jul 28, 2011 at 10:26

combatdave · Accepted Answer · 2011-07-28 14:18:34Z

15

The pickler module is storing all objects it has seen in its memo, so it doesn't have to pickle the same thing twice. You want to skip this (so references to your objects aren't stored in your pickler object) and clear the memo before dumping:

def append(self, x):
    self.fd.seek(self.wpos)
    self.pickler.clear_memo()
    self.pickler.dump(x)
    self.wpos = self.fd.tell()
    self.size = self.size + 1

Source: http://docs.python.org/library/pickle.html#pickle.Pickler.clear_memo

Edit: You can actually watch the size of the memo go up as you pickle your objects by using the following append function:

def append(self, x):
    self.fd.seek(self.wpos)
    print len(self.pickler.memo)
    self.pickler.dump(x)
    self.wpos = self.fd.tell()
    self.size = self.size + 1

edited Jul 28, 2011 at 14:18

answered Jul 28, 2011 at 11:43

combatdave

7757 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

piotr Over a year ago

That doesn't explain the increase in memory, since the object being pickled again is the same.

combatdave Over a year ago

Yes, it does. When you call self.pickler.dump(x), the pickler object does something like self.memo.append(x). As you go through your while True: loop in your example code, you are creating thousands of objects which your pickler object is keeping references to, meaning they are kept in memory and not gotten rid of by the GC. Calling self.pickler.clear_memo() essentially causes the pickler to do self.memo = [], getting rid of any references to the objects and allowing the GC to get rid of them.

combatdave Over a year ago

@poitr - I've edited my answer with some code which will allow you to watch the size of the memo increase as you pickle things.

piotr Over a year ago

This made a huge difference in memory consumption, from 1.5G to 11M

combatdave Over a year ago

Just bear in mind that this might cause your pickle to be larger - nothing will be picked by reference.

Hannes Landeholm · Accepted Answer · 2021-11-14 01:04:09Z

To add to the answer by combatdave@:

I just bypassed the terrible memo caching in pickle since clearing the memo on the reader side seems impossible and was an apparently unavoidable memory leak. Pickle streaming seem to be designed for reading and writing moderately sized files, not for reading and writing unbounded streams of data.

Instead I just used the following simple utility functions:

def framed_pickle_write(obj, stream):
    serial_obj = pickle.dumps(obj)
    length = struct.pack('>I', len(serial_obj))
    stream.write(length)
    stream.write(serial_obj)


def framed_pickle_read(stream):
    data = stream.read(4)
    length, = struct.unpack('>I', data)
    serial_obj = stream.read(length)
    return pickle.loads(serial_obj)

Collectives™ on Stack Overflow

Python monotonically increasing memory usage (leak?)

2 Answers 2

5 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Related