1

I an trying to create a plot for all csvs in a directory. When I run the script below, my RAM memory consumption just goes up monotonically. The code is simple albeit a bit longer:

import multiprocessing
import os
from glob import glob
import pandas as pd
from matplotlib import pyplot as plt

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

root_data_dir = '/home/user1/data/20191121'
root_img_dir = os.path.join(root_data_dir, 'figures')

if not os.path.exists(root_img_dir):
    os.mkdir(root_img_dir)

def plot_file(file):
    print("Processing {}".format(file))
    df = pd.read_csv(file, parse_dates=['date'], index_col='date', compression='xz')
    plt.plot(df)

    base_file = os.path.splitext(os.path.basename(file))[0]
    img_file = os.path.join(root_img_dir, base_file + '.png')

    plt.title(base_file)
    plt.savefig(img_file, dpi=300)
    print("Saved {}".format(img_file))
    plt.close()

multiprocessing.Pool(16).map(plot_file, sorted(glob(os.path.join(root_data_dir, '*.csv.xz'))))
6
  • try adding import gc then gc.collect() Commented Nov 28, 2019 at 21:00
  • @lssilva OK this did it Commented Nov 28, 2019 at 21:01
  • prob the DF was staying around and your system was not in need of free memory to reclaim it. Aggressive garbage collection may have negative impact on the performance. So I would let it decide on its own when to collect the memory though Commented Nov 28, 2019 at 21:05
  • @lssilva I tried adding del df after plt.close() but didn't work. The system has 32GB of RAM and I ended up consuming all of it within a short period of time after which the machine became unresponsive (I am running a recent version of linux for reference) Commented Nov 29, 2019 at 11:26
  • Do you have the issue if you do it sequentially instead of with 16 threads? Commented Nov 29, 2019 at 13:12

1 Answer 1

2

Add the following code

import gc

.. .. then inside plot_file gc.collect()

def plot_file(file):
    print("Processing {}".format(file))
    df = pd.read_csv(file, parse_dates=['date'], index_col='date', compression='xz')
    plt.plot(df)

    base_file = os.path.splitext(os.path.basename(file))[0]
    img_file = os.path.join(root_img_dir, base_file + '.png')

    plt.title(base_file)
    plt.savefig(img_file, dpi=300)
    print("Saved {}".format(img_file))
    plt.close()
    gc.collect()
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.