I'm having issues with Pandas taking up way too much RAM. I've got a file of 5.5gb with 2 columns of which I want to simply save all the unique values in the first column like so:
Main File
    Follower    Friend
0   12          260009730
1   12          17568791
2   12          22512883
3   12          15808761
4   12          10135072
5   12          988
6   12          22424855
7   13          9163182
8   14          22990962
9   15          7681662
10  15          17289517
to
Result File
     User
0    12
1    13
2    14
3    15
because of RAM limitations I'm importing the main file in pieces of 30, trying to purge the dataframes from memory and only appending the result file each time. After two iterations (out of thirty) the result file is 13.5mb. But it consistently crashes after the 6th iteration and I can see in my process management that python is taking up 4.5gb of RAM. I'm trying to call the garbage collector but apparantly it's not working, can you guys help me out? My code is as follows:
i = 0
userRelation = pd.DataFrame(columns=['User'])
Location = 'file.txt'
while i < 30:
    userRelationHelp = pd.DataFrame(columns=['User'])
    print(str(i))
    network = pd.read_csv(Location, sep="\t", header=None, encoding='Latin', low_memory=False, skiprows=(i * math.ceil(284884514/30)), nrows=(((i+1) * math.ceil(284884514/30))), names=['Follower', 'Friend'])
    userRelationHelp['User'] = network['Follower'].unique()
    userRelation = userRelation.append(userRelationHelp)
    lst = [userRelationHelp, network]
    del lst
    gc.collect()
    i += 1
From what I've read the last 3 lines before i += 1 should serve to purge the larger files from memory. After each iteration I can see my RAM used at the start of the cycle constantly increasing by ~200mb, and during the cycle it'll increase by more each run.
Base Python RAM usage before running above code: 76mb
Approximate Python RAM usage at start of cycle
0: 300
1: 800
2: 1000
3: 1300
Approximate Python RAM usage at end of cycle
0: 633
1: 2000
2: 2900
3: 3700
Can imagine point out what I'm doing or assuming incorrectly?