I'm trying to avoid using iterrows() in pandas and achieve a more performant solution. This is the code I have, where I loop through a DataFrame and for each record I need to add three more:
import pandas as pd
fruit_data = pd.DataFrame({
'fruit': ['apple','orange','pear','orange'],
'color': ['red','orange','green','green'],
'weight': [5,6,3,4]
})
array = []
for index, row in fruit_data.iterrows():
row2 = { 'fruit_2': row['fruit'], 'sequence': 0}
array.append(row2)
for i in range(2):
row2 = { 'fruit_2': row['fruit'], 'sequence': i + 1}
array.append(row2)
print(array)
My real DataFrame has millions of records. Is there a way to optimize this code and NOT use iterrows() or for loops?
for i in range(3): array.append({ 'fruit_2': row['fruit'], 'sequence': i })to begin with - seems you dumbed it down too much. that wont get rid of iterrrows but at least your code gets more concise - then try to describe the "what" not the "how" you tried to accomplish it - I am pretty sure that approach is already flawed.