Return to Revisions

2 of 9

edited title

edited Sep 9, 2020 at 4:43

Multiple Pandas Ranking Operations within a Loop - Better Optimization and Performance

I have implemented the following code which works as intended. However, I would like to improve my code in terms of performance and efficiency

import pandas as pd
from scipy.stats import norm

# data frame of length 40,000 rows, containing 25 columns
for indx in df.index:
    m_ord = df.loc[indx].rank(method='first',na_option='bottom')
    m_ord_avg = df.loc[indx].rank(method='average', na_option='bottom')
    m_ord.loc[df.loc[indx] == 0] = m_ord_avg
    ,x = norm.ppf(matrx_order / (len(df.columns) + 1))
    df.loc[indx] = matrx.T

This above is the part of a long python script where it runs slower than the other parts of the program. So what I am trying to do in the above code is to iterate over the data frame in a row-wise fashion and then for each row I have to perform the chains of pandas rank operations follow by a statistical test equivalent to the "One-tail test" then finally transpose the matrix which will be fed as a row for the data frame. How can I improve this block of code in terms of efficiency, speed, and performance?

Thank you so much in advance,

asked Sep 9, 2020 at 4:37

aBiologist