Appending rows to dataframe using an .iterrows() for loop

Question

Suppose i have the following dataframe:

     xx      yy      tt
0   2.8     1.0     1.0
1   85.0    4.48    6.5
2   2.1     8.0     1.0
3   8.0     1.0     0.0
4   9.0     2.54    1.64
5   5.55    7.25    3.15
6   1.66    0.0     4.0
7   3.0     7.11    1.98
8   1.0     0.0     4.65
9   1.87    2.33    0.0

What i want to do with it to create a for loop that iterates over all points in the df and calculate the euclidean distance to all the other points. For instance: the loop would iterate over point a and get the distances from point a to point b,c,d...n. Then it would go to point b, and it would get the distances to points a,c,d...n, and so on.

Once i get the distances, i want to have a value_counts() of the distances values, but for memory saving sake, i can't just value_counts() all the results i get from this foor loop, because my real df is too big, and i will end up running out of memory.

So what i thought, is to perform the value_counts() operation to the distance vector, this will give a 2 columns dataframe with the values and their respective counts, then when it iterates over point b and get all the distances, i want to compare the new values with the previous value_counts() df from the the first loop and check if there are any repeated values, if yes, then i want to += the counter for the repeated values, if no repeated values found, i want to append() all those rows with no repeated values to the distance df.

This is what i've got so far:

import pandas as pd

counts = pd.DataFrame()

for index, row in df.iterrows():

    dist = pd.Series(np.sqrt((row.xx - df.xx)**2 + (row.yy - df.yy)**2 + (row.tt - df.tt)**2)) # Create a vector containing all the distances from each point to the others

    counter = pd.Series(dist.value_counts(sort = True)).reset_index().rename(columns = {'index': 'values', 0:'counts'}) # Get a counter for every value in the distances vector

    if index in counter['values']:
        counter['counts'][index] += 1 # Check if the new values are in the counter df, if so, add +1 to each repeated value

    else:

        counts = counts.append((index,row)) # If no repeated values, then append new rows to the counter df

The expected result would be something like:

# These are the value counts for point a and its distances:

    values  counts
0   0.000000    644589
1   0.005395    1
2   0.005752    1
3   0.016710    1
4   0.023043    1
5   0.012942    1
6   0.020562    1

Now in the iteration over point b:

       values   counts
0   0.000000    644595  # Value repeated 6 times, so add +6 to the counter
1   0.005395    1
2   0.005752    1
3   0.016710    3  # Value repeated twice, so add +2 to the counter
4   0.023043    1
5   0.012942    1
6   0.020562    1
7   0.025080    1  # New value, so append a new row with value and counter
8   0.022467    1  # New value, so append a new row with value and counter

However, if you add print (counts) to the end of the loop to check the results of what this loop is doing, you'll see an empty dataframe. ANd that's why i'm asking this question. Why is this code giving an empty df, and how can i get this to work the way i want it to?

If you need more extra explanations, something is not clear, or need more information, please do not hesitate to ask for it.

Thanks in advance

it because your loop is never going to the else condition, that's why your dataframe is empty — Nihal
– Nihal, Commented Mar 18, 2019 at 10:35
no, is the df. Give me a second and i will edit the question so it will be clearer — Miguel 2488
– Miguel 2488, Commented Mar 18, 2019 at 11:46

Frenchy · Accepted Answer · 2019-03-18 14:36:33Z

if understand you, you want the occurence of each distance values:

so i suggest you to create a dict: keys are values and values of keys are the count:

data = """
   xx      yy      tt
2.8     1.0     1.0
85.0    4.48    6.5
2.1     8.0     1.0
8.0     1.0     0.0
9.0     2.54    1.64
5.55    7.25    3.15
1.66    0.0     4.0
3.0     7.11    1.98
1.0     0.0     4.65
1.87    2.33    0.0
"""

import pandas as pd
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')

dico ={}                            #i initialize the dict dico
for index, row in df.iterrows():
    dist = pd.Series(np.sqrt((row.xx - df.xx) ** 2 + (row.yy - df.yy) ** 2 + 
          (row.tt - df.tt) ** 2))   # Create a vector containing all the 
                                    #distances from each point to the others

    for f in dist:                  #i iterate through dist
        if f in dico:               #the key already exists in dict?
            dico[f] +=dico[f]       #yes i increment the value
        else:
            dico[f]=1               #no i create the key with the new distance and set to 1

print(dico)

output:

{0.0: 512, 
82.45726408267497: 2, 
7.034912934784623: 2, 
5.295280917949491: 2, 
6.4203738208923635: 2, 
7.158735921934822: 2, 
3.361487765856065: 2, 
6.191324575565393: 2, 
4.190763653560053: 2, 
1.9062528688503002: 2, 
83.15678204452118: 2, 
77.35218419669867: 2, 
76.17993961667337: 2, 
79.56882492534372: 2, 
    :
    :
7.511863949779708: 2,
0.9263368717696604: 2, 
4.633896848226123: 2, 
7.853725230742415: 2, 
5.295819105671946: 2, 
5.273357564208974: 2}

each values have at least 2 counts because its a crosstab and distance (point0 to point1) equaal distance(point1 to point0) ....

Hi again Frenchy. This is a bit closer to what i wanted, but does this compare the new count values to the previous ones and add them to the dict if they are not already in the dict?? Also, remember that if some new value was already in the dict, you just have to add +1 to that value's counter. Are these 2 conditions fulfilled? Thank you very much
i have added comments in prog is it ok? i have done from what i have understood (sorry for my english). with 600000 rows execution time will be long...
Ok, That's great. I understood everything now. Thank you very much for your answer. It helped a lot!! and no worries for the english :)

Collectives™ on Stack Overflow

Appending rows to dataframe using an .iterrows() for loop

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related