Why is my pandas dataframe not updating its values as I change them?

Question

I am trying to make changes to each string in my Series object 'tweet_text', but for some reason when I print the series object after making changes to the tweets in my for loop, I get the same strings as I had before the for loop. How can I fix this?

import pandas as pd
import re
import string

df = pd.read_csv('sample-tweets.csv',
                 names=['Tweet_Date', 'User_ID', 'Tweet_Text', 'Favorites', 'Retweets', 'Tweet_ID'])

sum_df = df[['User_ID', 'Tweet_ID', 'Tweet_Text']].copy()
sum_df.set_index(['User_ID'])
# print sum_df

tweet_text = df.ix[:, 2]
print type(tweet_text)

# efficiency could be im proved by using translate method
# regex = re.compile('[%s]' % re.escape(string.punctuation))

for tweet in tweet_text:
    tweet = re.sub('https://t.co/[a-zA-Z0-9]*', "", tweet)
    tweet = re.sub('@[a-zA-Z0-9]*', '', tweet)
    tweet = re.sub('#[a-zA-Z0-9]*', '', tweet)
    tweet = re.sub('$[a-zA-Z0-9]*', '', tweet)
    tweet = ''.join(i for i in tweet if not i.isdigit())
    tweet = tweet.replace('"', '')
    tweet = re.sub(r'[\(\[].*?[\)\]]', '', tweet)  # takes out everything between parentheses also, fix this

    # gets rid of all punctuation and emoji's
    tweet = "".join(l for l in tweet if l not in string.punctuation)
    tweet = re.sub(r'[^\x00-\x7F]+',' ', tweet)

    # gets ride of all extra spacing
    tweet = tweet.lower()
    tweet = tweet.strip()
    tweet = " ".join(tweet.split())

    count = count + 1
    # print tweet

print tweet_text

Because you are taking the tweet in the variable, making some changes to it and then next iteration starts. You are not assigning the changed data back to the series. — TrigonaMinima
– TrigonaMinima, Commented Jul 6, 2017 at 19:10

mkos · Accepted Answer · 2017-07-06 19:15:54Z

4

It is happening like that because tweet_text is a copy of a column df.ix[:, 2] for starters. Secondly, this is not pandas way to iterate over Series - you should use apply().

To update your code, everything that goes into the loop, change into function:

def parse_tweet(tweet):
    ## everything from loop goes here
    return tweet

Then, instead of:

tweet_text = df.ix[:, 2]

do:

df.iloc[:, 2] = df.iloc[:, 2].apply(parse_tweet)

BTW, do not use ix indexer as it is depreciated and going to be removed in the future versions of pandas.

answered Jul 6, 2017 at 19:15

mkos

4282 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

piRSquared Over a year ago

In regards to your most recent pandas answer. People can't up-vote without 15 rep. People asking the questions are your most certain up-vote. If you answer a question of someone without the required rep to up-vote you... do them a favor and up-vote their question to help get them over the line.

grovina · Accepted Answer · 2017-07-06 19:15:35Z

1

Python strings are immutable. You are just changing the value attributed to variable tweet, but never actually updating the dataframe.

You just have to reinsert the updated value back to your dataframe. Example of a simple fix:

for i, tweet in enumerate(tweet_text):
    tweet = re.sub('https://t.co/[a-zA-Z0-9]*', "", tweet)
    tweet = re.sub('@[a-zA-Z0-9]*', '', tweet)

    # ...

    # update dataframe
    df.ix[i, 2] = tweet

answered Jul 6, 2017 at 19:15

grovina

3,07721 silver badges25 bronze badges

1 Comment

praneeth98 Over a year ago

Thank you! I kept trying to see if dataframes were immutable, but forgot to check if strings are immutable (I would've expected otherwise in python haha)

Collectives™ on Stack Overflow

Why is my pandas dataframe not updating its values as I change them?

2 Answers 2

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Related