0

I am new to Pandas DataFrame and was curious why my basic thinking of adding new values to a new line doesn't work here.

I also tried using different ways with .loc[], .append(), but obciously used them in an incorrect way (still plenty to learn).

Instructions Add a column to data named length, defined as the length of each word. Add another column named frequency, which is defined as follows for each word in data:

  • If count > 10, frequency is "frequent".
  • If 1 < count <= 10, frequency is "infrequent".
  • If count == 1, frequency is "unique".

    My if sentenses record for all DataFrame only by last value of dictionary like object (Counter from pandas/numpy?). Word and count values are all returned within for cycle, so I don't understand why DataFrame cannot append values each cycle

    data['length'] = ''
    data['frequency'] = ''
    
    for word, count in counted_text.items():
        if count > 10:
            data.length = len(word)
            data.frequency = 'frequent'
        if 1 < count <=10:
            data.length = len(word)
            data.frequency = 'infrequent'
        if count == 1:
            data.length = len(word)
            data.frequency = 'unique'
    print(word, len(word), '\n')
    
    """
    This is working code that I googled
    -----------------------------------
    data = pd.DataFrame({
        "word": list(counted_text.keys()),
        "count": list(counted_text.values())
    })
    
    data["length"] = data["word"].apply(len)
    
    data.loc[data["count"] > 10,  "frequency"] = "frequent"
    data.loc[data["count"] <= 10, "frequency"] = "infrequent"
    data.loc[data["count"] == 1,  "frequency"] = "unique"
    
    """
    
    print(data.head(), '\n')
    print(data.tail())
    

Output:

finis 5 

       word  count  length frequency
1       the    935       5    unique
2  tragedie      3       5    unique
3        of    576       5    unique
4    hamlet     97       5    unique
5            45513       5    unique 

              word count  length frequency
5109  shooteexeunt     1       5    unique
5110      marching     1       5    unique
5111         peale     1       5    unique
5112           ord     1       5    unique
5113         finis     1       5    unique
1
  • The working code that you Googled is the right way to update rows in pandas based on specific conditions. You should never attempt to update rows in a data-frame by using loops. If you scrutinize the loop you've written when you say data.length = len(word), what row_number are you talking about? Commented Apr 21, 2020 at 18:03

2 Answers 2

1

Assuming you have only word and count in the data dataframe and that count will not have a value of 0, you could try the following -

import numpy as np
data['length'] = data['word'].str.len()
data['frequency'] = np.where(data['count'] > 10, 'frequent',\
                             np.where((data['count'] > 1) & (data['count'] <= 10),\
                             'infrequent', 'unique')) 
Sign up to request clarification or add additional context in comments.

2 Comments

Works. But I was looking also for an aswer why // if count > 10: data.length = len(word) /// took last value of for-loop?
I think in each iteration of the for-loop, you are re-assigning the value to data.length and data.frequency. So, the last value obtained at the end of the for-loop is being assigned.
0

After @Sajan gave a valid code, I came to a conclusion, that DataFrame doesn't need for-loop at all.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.