0

I have the following pandas DataFrame.

   Id UserId    Name            Date                 Class  TagBased
0   2   23  Autobiographer  2016-01-12T18:44:49.267     3   False
1   3   22  Autobiographer  2016-01-12T18:44:49.267     3   False
2   4   21  Autobiographer  2016-01-12T18:44:49.267     3   False
3   5   20  Autobiographer  2016-01-12T18:44:49.267     3   False
4   6   19  Autobiographer  2016-01-12T18:44:49.267     3   False

I want to iterate through "TagBased" column and put the User Ids in a list where TagBased=True. I have used the following code but I am getting no output which is incorrect because there are 18 True values in TagBased.

user_tagBased = []

for i in range(len(df)):
    if (df['TagBased'] is True):
        user_TagBased.append(df['UserId'])
print(user_TagBased)

Output: []
4
  • try df.loc[df['TagBased'],'UserId'].tolist() you dont need loops most of the times in pandas Commented Jun 24, 2020 at 15:15
  • I am getting the following error by trying this method: KeyError: "None of [Index(['False', 'False', 'False', 'False', 'False', 'False', 'False', 'False',\n 'False', 'False',\n ...\n 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False',\n 'False', 'False'],\n dtype='object', length=18087)] are in the [index]" Commented Jun 24, 2020 at 15:17
  • better is df.loc[df['TagBased'].eq("True"),'UserId'].tolist() since the values are string Commented Jun 24, 2020 at 15:35
  • This one works, Thanks a lot!! Commented Jun 24, 2020 at 15:48

3 Answers 3

1

As others are suggesting, using Pandas conditional filtering is the best choice here without using loops! However, to still explain why your code did not work as expected:

You are appending df['UserId'] in a for-loop while df['UserId'] is a column. Same goes for df['TagBased'] check, which is also a column.

I assume you want to append the userId at the current row in the for-loop.

You can do that by iterating through the df rows:

user_tagBased = []

for index, row in df.iterrows():
    if row['TagBased'] == 'True': # Because it is a string and not a boolean here
        user_tagBased.append(row['UserId'])
Sign up to request clarification or add additional context in comments.

Comments

1

Try this, you don't need to use loops for this:

user_list = df[df['TagBased']==True]['UserId'].tolist()
print(user_list)

[19, 19]

Comments

0

There is no need to use any loop.

Note that:

  • df.TagBased - yields a Series of bool type - TagBased column (I assume that TagBased column is of bool type).
  • df[df.TagBased] - is an example of boolean indexing - it retrieves rows where TagBased is True
  • df[df.TagBased].UserId - limits the above result to just UserId, almost what you want, but this is a Series, whereas you want a list.

So the code to produce your expected result, with saving in the destination variable, is:

user_tagBased = df[df.TagBased].UserId.to_list()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.