2

A data frame and I want to pick some it by the value in a column. In this case, rows of 'reports' between 10~31.

import pandas as pd

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Daisy', 'River', 'Kate', 'David', 'Jack', 'Nancy'], 
    'month of entry': ["20171002", "20171206", "20171208", "20171018", "20090506", "20171128", "20101216", "20171230", "20171115", "20171030", "20171216"],
    'reports': [14, 24, 31, 22, 34, 6, 47, 2, 14, 10, 8]}
df = pd.DataFrame(data)

df_4 = df[(df.reports >= 10) | (df.reports <= 31)]
df_5 = df.query('reports >= 10 | reports <= 31')

print df_4
print df_5

Above generated 2 sets of same wrong result (47 is there!):

   month of entry   name  reports
0        20171002  Jason       14
1        20171206  Molly       24
2        20171208   Tina       31
3        20171018   Jake       22
4        20090506    Amy       34
5        20171128  Daisy        6
6        20101216  River       47
7        20171230   Kate        2
8        20171115  David       14
9        20171030   Jack       10
10       20171216  Nancy        8

What went wrong? Thank you.

1
  • 1
    Replace the df_4 = df[(df.reports >= 10) | (df.reports <= 31)] to df_4 = df[(df.reports >= 10) & (df.reports <= 31)]. You want both to be true, thus use and, not or. Commented Mar 16, 2018 at 7:35

2 Answers 2

2

You need & for bitwise AND, but better is use between:

df1 = df[(df.reports >= 10) & (df.reports <= 31)]

Or:

df1 = df[df.reports.between(10,31)] 
print (df1)
  month of entry   name  reports
0       20171002  Jason       14
1       20171206  Molly       24
2       20171208   Tina       31
3       20171018   Jake       22
8       20171115  David       14
9       20171030   Jack       10

Detail:

print ((df.reports >= 10) & (df.reports <= 31))
0      True
1      True
2      True
3      True
4     False
5     False
6     False
7     False
8      True
9      True
10    False
Name: reports, dtype: bool
Sign up to request clarification or add additional context in comments.

Comments

2
import pandas as pd

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Daisy', 'River', 'Kate', 'David', 'Jack', 'Nancy'], 
    'month of entry': ["20171002", "20171206", "20171208", "20171018", "20090506", "20171128", "20101216", "20171230", "20171115", "20171030", "20171216"],
    'reports': [14, 24, 31, 22, 34, 6, 47, 2, 14, 10, 8]}
df = pd.DataFrame(data)
df_4 = df[(df.reports >= 10) & (df.reports <= 31)]   #Use '&' instead of '|'
print df_4

Output:

  month of entry   name  reports
0       20171002  Jason       14
1       20171206  Molly       24
2       20171208   Tina       31
3       20171018   Jake       22
8       20171115  David       14
9       20171030   Jack       10

2 Comments

thank you! would you mind I choose jezrael's for answer as he provided 2 methods?
Sure np :). I just like to practice code snippets :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.