Revisions to memory use in large data-structures manipulation/processing

improved grammar

Source Link

edited Mar 10, 2014 at 2:35

8.8k
5
53
66

I have a number of large (~100 Mb) files which I'm regularly processing. While I'm trying to delete unneeded data structures during processing, memory consumption is a bit too high. so, I was wondering isif there is a way to 'efficiently'efficiently manipulate large data, e.g.:

def read(self, filename):
    fc = read_100_mb_file(filename)
    self.process(fc)
def process(self, content):
    # do some processing of file content

isIs there a duplication of data structures? isn'tIsn't it more memory efficient to use a class-wide variableattribute like self.fc?

how toWhen should I use garbage-collect collection? I know about the gc module, but do I call it after iI del fc for example? does garbage collector called after a del statement at all? when should I use garbage collection?

update
p.s. 100 Mb is not a problem in itself. but float conversion, further processing add significantly more to both working set and virtual size (i'mI'm on windowsWindows).

I have a number of large (~100 Mb) files which I'm regularly processing. While I'm trying to delete unneeded data structures during processing, memory consumption is a bit too high. so, I was wondering is there a way to 'efficiently' manipulate large data, e.g.:

def read(self, filename):
    fc = read_100_mb_file(filename)
    self.process(fc)
def process(self, content):
    # do some processing of file content

is there a duplication of data structures? isn't it more memory efficient to use class-wide variable like self.fc?

how to garbage-collect? I know about gc module, but do I call it after i del fc for example? does garbage collector called after a del statement at all? when should I use garbage collection?

update
p.s. 100 Mb is not a problem in itself. but float conversion, further processing add significantly more to both working set and virtual size (i'm on windows).

I have a number of large (~100 Mb) files which I'm regularly processing. While I'm trying to delete unneeded data structures during processing, memory consumption is a bit too high. I was wondering if there is a way to efficiently manipulate large data, e.g.:

def read(self, filename):
    fc = read_100_mb_file(filename)
    self.process(fc)
def process(self, content):
    # do some processing of file content

Is there a duplication of data structures? Isn't it more memory efficient to use a class-wide attribute like self.fc?

When should I use garbage collection? I know about the gc module, but do I call it after I del fc for example?

update
p.s. 100 Mb is not a problem in itself. but float conversion, further processing add significantly more to both working set and virtual size (I'm on Windows).