Create a subset of a dataframe without using a for loop

Question

I would like to increase the speed at which this operation works

df = pd.DataFrame(columns = ['eventId','total'])
for event in df_events:
    df1 = data[data['eventId'] == event]
    df = pd.concat([df,df1])

df_events is an object containing a elements that look like this '2015-11-23#54#' This works for the purpose i want but i wondered if there was a quicker way of doing this without using a for loop.

Carlos Alvarenga · Accepted Answer · 2016-10-24 16:55:17Z

3

Try this:

df = data[data["eventId"].isin(df_events)]

answered Oct 24, 2016 at 16:55

Carlos Alvarenga

1493 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Schmuddi Over a year ago

I completely forgot about the isin() method, but this is probably even faster, and certainly more readable, than my answer.

Schmuddi · Accepted Answer · 2016-10-24 22:31:05Z

A one-liner without a loop can do what you want to do for you:

df = data[data["eventId"].apply(lambda x: x in df_events)]

This is, indeed, notably faster than your current solution (I tried that with a very, very small data):

data = pd.DataFrame({'eventId': {0: '2015-11-23#54#',
    1: '2015-11-23#55#',
    2: '2015-11-23#56#',
    3: '2015-11-23#54#',
    4: '2015-11-23#55#',
    5: '2015-11-23#56#'},
    'total': {0: 2, 1: 8, 2: 9, 3: 4, 4: 3, 5: 5}})

df_events = ['2015-11-23#54#', '2015-11-23#56#']

In [14]: %timeit df = data[data["eventId"].apply(lambda x: x in df_events)]
1000 loops, best of 3: 737 µs per loop

In [15]: %%timeit df = pd.DataFrame(columns = ['eventId','total'])
   ....: for event in df_events:
   ....:     df1 = data[data['eventId'] == event]
   ....:     df = pd.concat([df,df1])
   ....: 
100 loops, best of 3: 8.18 ms per loop

Collectives™ on Stack Overflow

Create a subset of a dataframe without using a for loop

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related