0

I have a large CSV file and I have to sort and write the sorted data to another csv file. The CSV file has 10 columns. Here is my code for sorting.

data = [ x.strip().split(',') for x in open(filename+'.csv', 'r').readlines() if x[0] != 'I' ]

data = sorted(data, key=lambda x: (x[6], x[7], x[8], int(x[2])))

with open(filename + '_sorted.csv', 'w') as fout:
    for x in data:
        print(','.join(x), file=fout)

It works fine with file size below 500 Megabytes but cannot process files with a size greater than 1 GB. Is there any way I can make this process memory efficient? I am running this code on Google Colab.

2
  • Why not use the pandas.read_csv() function? It should perform the same as your loop for importing in data. Commented Apr 2, 2019 at 6:07
  • Can you provide an example with a code? Let's say I've loaded the CSV file with pandas.read_csv. Then how can I sort this? Commented Apr 2, 2019 at 6:12

1 Answer 1

1

Here is a Link to a blog about using pandas for large datasets. In the examples from the link, they are looking at analyzing data from large datasets ~1gb in size.

Simply type the following to import your csv data into python.

import pandas as pd
gl = pd.read_csv('game_logs.csv', sep = ',')
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.