2

Trying to understand variable scope with a function call.

Code to discuss.

import numpy as np
import pandas as pd

# Function to add a column with random stuff to a dataframe 
def Add_a_column(df):
    df['Col2']= np.sign(np.random.randn(len(df)))
    return df

# Create a dataframe with random stuff
df_full = pd.DataFrame(data=np.sign(np.random.randn(5)), columns=['Col1'])

df_another = Add_a_column(df_full)
  • df_full is global. Correct?
  • df_another is global. Correct?
  • df is local to Add_a_column. Correct?

When I execute the code, the column get's added to df_full

In[8]: df_full
Out[8]: 
   Col1  Col2
0  -1.0  -1.0
1   1.0  -1.0
2  -1.0   1.0
3   1.0   1.0
4   1.0   1.0

How do I avoid df_full being modified by the function?

6
  • 3
    The name df is local to the function, but df and df_full refer to the same object. Commented Dec 29, 2017 at 19:19
  • sounds like you want to clone df_full in the function, manipulate the new object, and then send that back. Commented Dec 29, 2017 at 19:23
  • Expanding a bit what @DanielRoseman said, and without knowing anything about Pandas, I imagine you need to copy the df_full before passing it to the Add_a_column function? (see pandas.pydata.org/pandas-docs/stable/generated/… ) and read why this happens here: stackoverflow.com/q/2612802/289011 Commented Dec 29, 2017 at 19:23
  • @BorrajaX or clone in the function. Im not sure what his end goal is. Commented Dec 29, 2017 at 19:24
  • @BorrajaX You are correct but in pandas this might actually be a bit of a shock for the OP since a lot of operations require inplace=True to actually take effect in such a way, so I can see where their confusion comes from :) Commented Dec 29, 2017 at 19:36

2 Answers 2

1

df_full's reference is passed into the function. So df and df_full are the same object, meaning they both get modified when one is modified.

You need to change your function to:

def Add_a_column(df):
    df = df.copy()
    df['Col2']= np.sign(np.random.randn(len(df)))
    return df

Alternatively, you could call the function with a copied function like Add_a_column(df.copy())

Sign up to request clarification or add additional context in comments.

Comments

0
  • df_full is global. Correct?
  • df_another is global. Correct?
  • df is local to Add_a_column. Correct?

It sounds like you understand scope just fine. Each variable had the scope you describe.

The piece you are missing is that df_full and df refer too the same object. When you make changes to that object with one variable, the changes are visible when you access that object with the other variable.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.