0

I've built two Pandas dataframes like this:

import panda as pd
d = {'FIPS' : pd.Series(['01001', '01002']), 'count' : pd.Series([3, 4])}
df1  = pd.DataFrame(d)
df2 = df1

I want to change one of the values in df2. This is what I've tried:

df2.loc[df2['FIPS'] == '01001','FIPS'] = '01003' 

This line appears to update both df1 and df2, but I don't understand why.

1
  • 1
    Reid, if Jan's reply below answered your question, then accept it as the answer. Commented Aug 24, 2017 at 16:43

2 Answers 2

1

Because df2 is only a reference of df1. They point to the same object in the memory, only by different name. If you do df2=df1.copy() it should create a new memory for df2 and only update it..plus you have a typo in import pandas :)

You can check what memory address the object is located in with id(df1) and see its same as df2 and changes if you use the .copy() method

Welcome to SO!

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks so much! This answers my question. I had no idea that assigning a dataframe to a variable name was so different from assigning an integer.
Great! :) You can accept my answer by clicking the tick button next to it. Enjoy SO community
0

Instead of df2 = df1, say df2 = df1.copy().

The issue is that variables in python act like "pointers" when you assign them complex data structures. They store references to their values, rather than the actual values. So in your code above, df2 becomes another name or alias for df1. Hence the unexpected change.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.