(Python) pandas.DataFrame doesn't update value each for-loop cycle, why?

Question

I am new to Pandas DataFrame and was curious why my basic thinking of adding new values to a new line doesn't work here.

I also tried using different ways with .loc[], .append(), but obciously used them in an incorrect way (still plenty to learn).

Instructions Add a column to data named length, defined as the length of each word. Add another column named frequency, which is defined as follows for each word in data:

If count > 10, frequency is "frequent".
If 1 < count <= 10, frequency is "infrequent".

If count == 1, frequency is "unique".

My if sentenses record for all DataFrame only by last value of dictionary like object (Counter from pandas/numpy?). Word and count values are all returned within for cycle, so I don't understand why DataFrame cannot append values each cycle

data['length'] = ''
data['frequency'] = ''

for word, count in counted_text.items():
    if count > 10:
        data.length = len(word)
        data.frequency = 'frequent'
    if 1 < count <=10:
        data.length = len(word)
        data.frequency = 'infrequent'
    if count == 1:
        data.length = len(word)
        data.frequency = 'unique'
print(word, len(word), '\n')

"""
This is working code that I googled
-----------------------------------
data = pd.DataFrame({
    "word": list(counted_text.keys()),
    "count": list(counted_text.values())
})

data["length"] = data["word"].apply(len)

data.loc[data["count"] > 10,  "frequency"] = "frequent"
data.loc[data["count"] <= 10, "frequency"] = "infrequent"
data.loc[data["count"] == 1,  "frequency"] = "unique"

"""

print(data.head(), '\n')
print(data.tail())

Output:

finis 5 

       word  count  length frequency
1       the    935       5    unique
2  tragedie      3       5    unique
3        of    576       5    unique
4    hamlet     97       5    unique
5            45513       5    unique 

              word count  length frequency
5109  shooteexeunt     1       5    unique
5110      marching     1       5    unique
5111         peale     1       5    unique
5112           ord     1       5    unique
5113         finis     1       5    unique

The working code that you Googled is the right way to update rows in pandas based on specific conditions. You should never attempt to update rows in a data-frame by using loops. If you scrutinize the loop you've written when you say data.length = len(word), what row_number are you talking about? — Cavin Dsouza
– Cavin Dsouza, Commented Apr 21, 2020 at 18:03

Sajan · Accepted Answer · 2020-04-21 18:06:40Z

1

Assuming you have only word and count in the data dataframe and that count will not have a value of 0, you could try the following -

import numpy as np
data['length'] = data['word'].str.len()
data['frequency'] = np.where(data['count'] > 10, 'frequent',\
                             np.where((data['count'] > 1) & (data['count'] <= 10),\
                             'infrequent', 'unique'))

answered Apr 21, 2020 at 18:06

Sajan

1,2671 gold badge7 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cabavyras Over a year ago

Works. But I was looking also for an aswer why // if count > 10: data.length = len(word) /// took last value of for-loop?

Sajan Over a year ago

I think in each iteration of the for-loop, you are re-assigning the value to data.length and data.frequency. So, the last value obtained at the end of the for-loop is being assigned.

cabavyras · Accepted Answer · 2020-04-21 19:24:22Z

0

After @Sajan gave a valid code, I came to a conclusion, that DataFrame doesn't need for-loop at all.

answered Apr 21, 2020 at 19:24

cabavyras

577 bronze badges

Collectives™ on Stack Overflow

(Python) pandas.DataFrame doesn't update value each for-loop cycle, why?

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related