Python: updating dataframe values inside for loop

Question

I have a dataframe (lets call it goodsdf) which consists of 240K rows and a bunch of columns:

id	name	description	...	availability	price
1001	Item A	Frying pan	...	3	20.1
2031	Item B	Firewood	...	0	5
3412	Item C	Olive oil	...	10	12.5

Now in the next step, I'm constantly reading a stream of updated items. Those updates include among others new prices for items, which are pulled every 90 seconds. The stream I'm receiving includes also some additional 100K items which are not of interest for my store.

What I'm looking into doing is to update the dataframe with new prices. To do so I use the following (partially pseudo) code:

for entity in feed.entity:
    if entity.HasField('product_update'):
        if entity.product_update.id == goodsdf['id']:                  #pseudo
            if goodsdf['availability'] != 0:                           #pseudo
                set goodsdf['price'] == entity.product_update.price    #pseudo

From what I have read, there are several different ways for accessing values in dataframes, e.g. by using isin(), str.contains() and a couple of others. However, many of them return True and False values only. Another way I tried to solve this is by reading new prices and specific item IDs into separate dataframes, which are later merged into my original goodsdf. This in turn showed to create penalties for time and computer resources.

I'm not quite sure I fully understand the concept using nested if statements in combination with updating values in dataframes.

I'm also considering ditching the idea of using dataframes in favor of setting up SQL database. — user07345
– user07345, Commented Jan 27, 2022 at 20:52

RJ Adriaansen · Accepted Answer · 2022-01-27 23:45:44Z

One approach could be to extract the id and price first to create a flattened list of dicts, load this as a new dataframe, merge it with the other based on id, then replace the prices in the original df when they meet the conditions with pd.where(). I'm not sure if it's efficient enough for your use case, but at least it avoids looping through the data:

import pandas as pd

feed = {'entity1':{'product_update':{'id':1001, 'price':999}}, 'entity2':{'product_update':{'id':2031, 'price':999}}, 'entity3':{'superfluous':'test'}}
extracted_feed_data = [v for val in feed.values() if (v := val.get('product_update'))]
data = [ { "id": 1001, "name": "Item A", "description": "Frying pan", "availability": 3, "price": 20.1 }, { "id": 2031, "name": "Item B", "description": "Firewood", "availability": 0, "price": 5 }, { "id": 3412, "name": "Item C", "description": "Olive oil", "availability": 10, "price": 12.5 } ]

df_update = pd.DataFrame(extracted_feed_data)
df = pd.DataFrame(data)

merged = df.merge(df_update, on='id', how='left')
df['price'] = merged['price_y'].where((df['availability'] != 0) & (merged['price_y'].notnull()), df['price'])

Output df:

|    |   id | name   | description   |   availability |   price |
|---:|-----:|:-------|:--------------|---------------:|--------:|
|  0 | 1001 | Item A | Frying pan    |              3 |   999   |
|  1 | 2031 | Item B | Firewood      |              0 |     5   |
|  2 | 3412 | Item C | Olive oil     |             10 |    12.5 |

Thanks for the reply! I agree with you, I don't think is very efficient way of handling such amount of data. I give a try and see what results I get, otherwise I think SQL database is much better idea.

Collectives™ on Stack Overflow

Python: updating dataframe values inside for loop

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related