I’d assume the root cause is my loops in the matrix parts.
Issues: Yes, looping is an anti-pattern in pandas, so iterrows should almost always be avoided.
- You suspected correctly that
iterrows is very inefficient and should almost always be avoided
- Computing the final prices (
z) at the numpy layer means that you lose the dataframe labels
On top of that, after doing all those expensive dataframe loops, you then discard the dataframe labels and just return a numpy array z.
SuggestionsSuggested pandas approach:
Perform the computationsCompute in long form and then reshape into your desired wide form:
- Merge into a long cross table
- Compute the weights and weighted priceprices
- Pivot into a wide cross table
- Zero out the diagonal
This not only reduces your 100+ lines of code to ~15 lines:
Note how thisBut is also significantly faster:
>>> %timeit original()
15.3 ms ± 409 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit suggested()
1.82 ms ± 50.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
And retains yourthe dataframe labels:
Product_y Product 1 Product 2
Year_y Year 1 Year 2 Year 3 Year 4 Year 5 Year 1 Year 2 Year 3 Year 4 Year 5
Product_x Year_x
Product 1 Year 1 0.0 748.0 1020.0 1700.0 1496.0 2720.0 1400.8 2040.0 4202.4 2720.0
Year 2 748.0 0.0 660.0 1100.0 968.0 1760.0 906.4 1320.0 2719.2 1760.0
Year 3 1020.0 660.0 0.0 1500.0 1320.0 2400.0 1236.0 1800.0 3708.0 2400.0
Year 4 1700.0 1100.0 1500.0 0.0 2200.0 4000.0 2060.0 3000.0 6180.0 4000.0
Year 5 1496.0 968.0 1320.0 2200.0 0.0 3520.0 1812.8 2640.0 5438.4 3520.0
Product 2 Year 1 2720.0 1760.0 2400.0 4000.0 3520.0 0.0 20600.0 30000.0 61800.0 40000.0
Year 2 1400.8 906.4 1236.0 2060.0 1812.8 20600.0 0.0 15450.0 31827.0 20600.0
Year 3 2040.0 1320.0 1800.0 3000.0 2640.0 30000.0 15450.0 0.0 46350.0 30000.0
Year 4 4202.4 2719.2 3708.0 6180.0 5438.4 61800.0 31827.0 46350.0 0.0 61800.0
Year 5 2720.0 1760.0 2400.0 4000.0 3520.0 40000.0 20600.0 30000.0 61800.0 0.0
And is significantly faster:
>>> %timeit original()
15.3 ms ± 409 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit suggested()
1.82 ms ± 50.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)