parsing CSV in pandas

Question

I want to calculate the average number of successful Rattatas catches hourly for this whole dataset. I am looking for an efficient way to do this by utilizing pandas--I'm new to Python and pandas.

If you're code already works, you better ask this question on Code Review. — Nander Speerstra
– Nander Speerstra, Commented Jan 5, 2017 at 10:54

v.grabovets · Accepted Answer · 2017-01-05 11:50:21Z

1

You don't need any loops. Try this. I think logic is rather clear.

import pandas as pd

#read csv
df = pd.read_csv('pkmn.csv', header=0)

#we need apply some transformations to extract date from timestamp
df['time'] = df['time'].apply(lambda x : pd.to_datetime(str(x)))
df['date'] = df['time'].dt.date

#main transformations
df = df.query("Pokemon == 'rattata' and caught == True").groupby('hour')
result = pd.DataFrame()
result['caught total'] = df['hour'].count()
result['days'] = df['date'].nunique()
result['caught average'] = result['caught total'] / result['days']

answered Jan 5, 2017 at 11:50

v.grabovets

6479 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

S. Kelly · Accepted Answer · 2017-01-05 13:04:23Z

0

If you have your pandas dataframe saved as df this should work:

        rats = df.loc[df.Pokemon == "rattata"] #Gives you subset of rows relating to Rattata

        total = sum(rats.Caught) #Gives you the number caught total

        diff = rats.time[len(rats)] - rats.time[0] #Should give you difference between first and last 

        average = total/diff #Should give you the number caught per unit time

answered Jan 5, 2017 at 13:04

S. Kelly

15510 bronze badges

Collectives™ on Stack Overflow

parsing CSV in pandas

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related