Get rid of iterrows in pandas loop

Question

I'm trying to avoid using iterrows() in pandas and achieve a more performant solution. This is the code I have, where I loop through a DataFrame and for each record I need to add three more:

import pandas as pd

fruit_data = pd.DataFrame({
    'fruit':  ['apple','orange','pear','orange'],
    'color':  ['red','orange','green','green'],
    'weight': [5,6,3,4]
})

array = []

for index, row in fruit_data.iterrows():

    row2 = { 'fruit_2': row['fruit'], 'sequence': 0}
    array.append(row2)
    
    for i in range(2):
        row2 = { 'fruit_2': row['fruit'], 'sequence': i + 1}
        array.append(row2)

print(array)

My real DataFrame has millions of records. Is there a way to optimize this code and NOT use iterrows() or for loops?

what are you rtying to achieve by adding stuff froma df into a normal list? Whatfor? Pretty sure whatever you want to be done can be done differently... — Patrick Artner
– Patrick Artner, Commented Mar 14, 2022 at 17:30
@PatrickArtner this is a simplification of a more complex problem — ps0604
– ps0604, Commented Mar 14, 2022 at 17:31
@ps0604 You will want to post something that's more representative of the actual problem, then. Especially regarding Pandas performance, simplified problems will lead to bad solutions. — AKX
– AKX, Commented Mar 14, 2022 at 17:34
you could start with for i in range(3): array.append({ 'fruit_2': row['fruit'], 'sequence': i }) to begin with - seems you dumbed it down too much. that wont get rid of iterrrows but at least your code gets more concise - then try to describe the "what" not the "how" you tried to accomplish it - I am pretty sure that approach is already flawed. — Patrick Artner
– Patrick Artner, Commented Mar 14, 2022 at 17:35
This seems like an XY problem. What is the original problem that you are trying to solve? What are you trying to accomplish by building a list of dictionaries from the dataframe? Why don't you use the dataframe directly to solve this problem? If you don't know how to answer the last question, we can probably give suggestions once you answer the other questions. — Code-Apprentice
– Code-Apprentice, Commented Mar 14, 2022 at 17:36

user7864386user7864386 · Accepted Answer · 2022-03-14 17:33:01Z

You could use repeat to repeat each fruit 3 times; then groupby + cumcount to assign sequence numbers; finally to_dict for the final output:

tmp = fruit_data['fruit'].repeat(3).reset_index(name='fruit_2')
tmp['sequence'] = tmp.groupby('index').cumcount()
out = tmp.drop(columns='index').to_dict('records')

Output:

[{'fruit_2': 'apple', 'sequence': 0},
 {'fruit_2': 'apple', 'sequence': 1},
 {'fruit_2': 'apple', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2},
 {'fruit_2': 'pear', 'sequence': 0},
 {'fruit_2': 'pear', 'sequence': 1},
 {'fruit_2': 'pear', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2}]

Nice! One-liner: fruit_data['fruit'].repeat(3).reset_index(name='fruit_2').pipe(lambda x: x.assign(sequence=x.groupby('index').cumcount())).drop(columns='index').to_dict('records')

user17242583user17242583 · Accepted Answer · 2022-03-14 17:40:35Z

Try this out:

array = (
    fruit_data['fruit']
    .repeat(3)
    .to_frame(name='fruit_2')
    .set_index(np.tile(np.arange(3), len(fruit_data['fruit'])))
    .reset_index()
    .rename({'index':'sequence'},axis=1)
    [['fruit_2', 'sequence']]
    .to_dict('records')
)

Output:

>>> array
[{'fruit_2': 'apple', 'sequence': 0},
 {'fruit_2': 'apple', 'sequence': 1},
 {'fruit_2': 'apple', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2},
 {'fruit_2': 'pear', 'sequence': 0},
 {'fruit_2': 'pear', 'sequence': 1},
 {'fruit_2': 'pear', 'sequence': 2},
 {'fruit_2': 'orange', 'sequence': 0},
 {'fruit_2': 'orange', 'sequence': 1},
 {'fruit_2': 'orange', 'sequence': 2}]

Collectives™ on Stack Overflow

Get rid of iterrows in pandas loop

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related