Trying to understand variable scope with a function call.
Code to discuss.
import numpy as np
import pandas as pd
# Function to add a column with random stuff to a dataframe 
def Add_a_column(df):
    df['Col2']= np.sign(np.random.randn(len(df)))
    return df
# Create a dataframe with random stuff
df_full = pd.DataFrame(data=np.sign(np.random.randn(5)), columns=['Col1'])
df_another = Add_a_column(df_full)
- df_full is global. Correct?
- df_another is global. Correct?
- df is local to Add_a_column. Correct?
When I execute the code, the column get's added to df_full
In[8]: df_full
Out[8]: 
   Col1  Col2
0  -1.0  -1.0
1   1.0  -1.0
2  -1.0   1.0
3   1.0   1.0
4   1.0   1.0
How do I avoid df_full being modified by the function?
dfis local to the function, butdfanddf_fullrefer to the same object.df_fullbefore passing it to theAdd_a_columnfunction? (see pandas.pydata.org/pandas-docs/stable/generated/… ) and read why this happens here: stackoverflow.com/q/2612802/289011inplace=Trueto actually take effect in such a way, so I can see where their confusion comes from :)