Problem with processing large(>1 GB) CSV file

Question

I have a large CSV file and I have to sort and write the sorted data to another csv file. The CSV file has 10 columns. Here is my code for sorting.

data = [ x.strip().split(',') for x in open(filename+'.csv', 'r').readlines() if x[0] != 'I' ]

data = sorted(data, key=lambda x: (x[6], x[7], x[8], int(x[2])))

with open(filename + '_sorted.csv', 'w') as fout:
    for x in data:
        print(','.join(x), file=fout)

It works fine with file size below 500 Megabytes but cannot process files with a size greater than 1 GB. Is there any way I can make this process memory efficient? I am running this code on Google Colab.

Why not use the pandas.read_csv() function? It should perform the same as your loop for importing in data. — Hojo.Timberwolf
– Hojo.Timberwolf, Commented Apr 2, 2019 at 6:07
Can you provide an example with a code? Let's say I've loaded the CSV file with pandas.read_csv. Then how can I sort this? — tahsin314
– tahsin314, Commented Apr 2, 2019 at 6:12

Hojo.Timberwolf · Accepted Answer · 2019-04-02 06:25:37Z

1

Here is a Link to a blog about using pandas for large datasets. In the examples from the link, they are looking at analyzing data from large datasets ~1gb in size.

Simply type the following to import your csv data into python.

import pandas as pd
gl = pd.read_csv('game_logs.csv', sep = ',')

answered Apr 2, 2019 at 6:25

Hojo.Timberwolf

1,1112 gold badges14 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Problem with processing large(>1 GB) CSV file

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related