I have a dataframe (lets call it goodsdf) which consists of 240K rows and a bunch of columns:
| id | name | description | ... | availability | price |
|---|---|---|---|---|---|
| 1001 | Item A | Frying pan | ... | 3 | 20.1 |
| 2031 | Item B | Firewood | ... | 0 | 5 |
| 3412 | Item C | Olive oil | ... | 10 | 12.5 |
Now in the next step, I'm constantly reading a stream of updated items. Those updates include among others new prices for items, which are pulled every 90 seconds. The stream I'm receiving includes also some additional 100K items which are not of interest for my store.
What I'm looking into doing is to update the dataframe with new prices. To do so I use the following (partially pseudo) code:
for entity in feed.entity:
if entity.HasField('product_update'):
if entity.product_update.id == goodsdf['id']: #pseudo
if goodsdf['availability'] != 0: #pseudo
set goodsdf['price'] == entity.product_update.price #pseudo
From what I have read, there are several different ways for accessing values in dataframes, e.g. by using isin(), str.contains() and a couple of others. However, many of them return True and False values only. Another way I tried to solve this is by reading new prices and specific item IDs into separate dataframes, which are later merged into my original goodsdf. This in turn showed to create penalties for time and computer resources.
I'm not quite sure I fully understand the concept using nested if statements in combination with updating values in dataframes.