I want to calculate the average number of successful Rattatas catches hourly for this whole dataset. I am looking for an efficient way to do this by utilizing pandas--I'm new to Python and pandas.
-
If you're code already works, you better ask this question on Code Review.Nander Speerstra– Nander Speerstra2017-01-05 10:54:44 +00:00Commented Jan 5, 2017 at 10:54
-
It doesn't work :(Ravmcgav– Ravmcgav2017-01-05 11:02:50 +00:00Commented Jan 5, 2017 at 11:02
-
Can you upload you code (minimal reproducible example)?Nander Speerstra– Nander Speerstra2017-01-05 12:19:53 +00:00Commented Jan 5, 2017 at 12:19
-
Do you hack Pokemon Go? :)Blaszard– Blaszard2017-01-05 13:12:48 +00:00Commented Jan 5, 2017 at 13:12
Add a comment
|
2 Answers
You don't need any loops. Try this. I think logic is rather clear.
import pandas as pd
#read csv
df = pd.read_csv('pkmn.csv', header=0)
#we need apply some transformations to extract date from timestamp
df['time'] = df['time'].apply(lambda x : pd.to_datetime(str(x)))
df['date'] = df['time'].dt.date
#main transformations
df = df.query("Pokemon == 'rattata' and caught == True").groupby('hour')
result = pd.DataFrame()
result['caught total'] = df['hour'].count()
result['days'] = df['date'].nunique()
result['caught average'] = result['caught total'] / result['days']
Comments
If you have your pandas dataframe saved as df this should work:
rats = df.loc[df.Pokemon == "rattata"] #Gives you subset of rows relating to Rattata
total = sum(rats.Caught) #Gives you the number caught total
diff = rats.time[len(rats)] - rats.time[0] #Should give you difference between first and last
average = total/diff #Should give you the number caught per unit time
