1

I have a DataFrame called ES_15M_Summary, with coefficients/betas in on column titled ES_15M_Summary['Rolling_OLS_Coefficient'] as follows:

Column 'Rolling_OLS_Coefficient'

If the above pictured column ('Rolling_OLS_Coefficient') is a value greater than .08, I want a new column titled 'Long' to be a binary 'Y'. If the value in the other column is less than .08, I want that value to be 'NaN' or just 'N' (either works).

So I'm writing a for loop to run down the columns. First, I created a new column titled 'Long' and set it to NaN:

ES_15M_Summary['Long'] = np.nan

Then I made the following For Loop:

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary['Long'] = 'Y'
    else:
        ES_15M_Summary['Long'] = 'NaN'

I get the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

...referring to the if statement line shown above (if...>.08:). I'm not sure why I'm getting this error or what's wrong with the for loop. Any help is appreciated.

1 Answer 1

5

I think better is use numpy.where:

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')

Sample:

ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]})
print (ES_15M_Summary)
   Rolling_OLS_Coefficient
0                     0.07
1                     0.01
2                     0.09

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
print (ES_15M_Summary)
   Rolling_OLS_Coefficient Long
0                     0.07    N
1                     0.01    N
2                     0.09    Y

Looping, very slow solution:

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary.loc[index,'Long'] = 'Y'
    else:
        ES_15M_Summary.loc[index,'Long'] = 'N'
print (ES_15M_Summary)
   Rolling_OLS_Coefficient Long
0                     0.07    N
1                     0.01    N
2                     0.09    Y

Timings:

#3000 rows
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000})
#print (ES_15M_Summary)


def loop(df):
    for index, row in ES_15M_Summary.iterrows():
        if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
            ES_15M_Summary.loc[index,'Long'] = 'Y'
        else:
            ES_15M_Summary.loc[index,'Long'] = 'N'
    return (ES_15M_Summary)

print (loop(ES_15M_Summary))


In [51]: %timeit (loop(ES_15M_Summary))
1 loop, best of 3: 2.38 s per loop

In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N')
1000 loops, best of 3: 555 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thank You, I'm using the for loop you provided. Much appreciated.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.