Pandas for loop with iterrows() and naming of dataframes

Question

I have a big dataframe, a sample of this df is like as follows:

etf_list = pd.DataFrame({'ISIN':['LU1737652583', 'IE00B44T3H88', 'IE0005042456', 'IE00B1FZS574', 'IE00BYMS5W68'],
                     'ETF_Vendor':['Amundi', 'HSBC', 'iShares', 'iShares', 'Invesco']})

In my local folder 'ETF/Input/', among many other files, the files IE00B1FZS574.csv and IE0005042456.csv are stored.

I would like to create a dataframe by reading the csv files, but only each iteration if the ETF_Vendor in etf_list equals 'iShares'. So I wrote the following for loop:

iShares = [] 
for i, row in etf_list.iterrows():
    if row['ETF_Vendor'] == 'iShares':
        ISIN = row['ISIN']
        iShares.append(ISIN)  # At each iteration, the list is filled with the ISINs for the relevant dataframes
        # Assign downloaded file the name of the relevant ISIN
        df[row['ISIN']] = 'ETF/Input/' + row['ISIN'] + '.csv'
        # Define file as DataFrame, again specifying the ISIN as the name for the DataFrame.
        df[row['ISIN']] = pd.read_csv(df[row['ISIN']], sep=',', skiprows=2, thousands='.', decimal=',')
    else:
        pass

The problem with this loop is that the dataframes named like df['IE00B1FZS574']. But I want the dataframes to be named like the ISIN, so like e.g. IE00B1FZS574

How do I have to change my code in order to name the dataframes as e.g. IE00B1FZS574 instead of df['IE00B1FZS574']?

TY in advance.

it's not clear (To a programmer) what you mean by naming them properly. Can you show the intended result for 1 example? — ramslök
– ramslök, Commented Jun 11, 2022 at 12:52
@creanion Yes, instead of df['IE00B1FZS574'] I would like to have IE00B1FZS574 — Economist Learning Python
– Economist Learning Python, Commented Jun 11, 2022 at 12:56
It sounds like you want to create variables with those names dynamically. That's the kind of words a programmer would understand I think. ("name" is many things, including in pandas columns, indexes, etc have a name attribute) I'd ask, in which scope do you want to create them and why not use a SimpleNamespace or dict to hold them? — ramslök
– ramslök, Commented Jun 11, 2022 at 13:26
@creanion Ty, I am not very sure if I understand your point correctly. My point is I would like to call the dataframes using the ISIN. Instead of calling the created dataframes for example by typing df['IE00B1FZS574'] I would like to call them with their ISIN IE00B1FZS574, I am not sure if its clearer for u now? — Economist Learning Python
– Economist Learning Python, Commented Jun 11, 2022 at 13:35

ramslök · Accepted Answer · 2022-06-11 14:10:57Z

There are a couple of ways to go about it

Let's say you read the data as in your question. Here I'm storing each dataframe in a dict called dataframes. Orderly and Pythonic, so far so good

import pandas as pd

dataframes = {}
for i, row in something_you_have:  # Your details
    name = row['ISIN']
    dataframes[name] = pd.read_csv(....)

Now we can access the dataframes using dataframes['IE00B1FZS574'] and so on.

How to make this a bit more fluent?

A. Keep the dataframes in the dict. This is also an alternative.

B. We can use a namespace

import types

datans = types.SimpleNamespace(**dataframes)

datans.IE00B1FZS574

With the namespace we can access items from the previous dicts as just attributes on the namespace. Of course the keys in the dict need to be valid python identifiers. So datans.IE00B1FZS574 works here.

C. We can add items from the dataframes dict directly into the current module-global namespace.

When is this appropriate? In a notebook maybe. Some would say this is bad style.

# update the "globals" (current module namespace) with the dict
globals().update(dataframes)

IE00B1FZS574

Now we can access the dataframes using just IE00B1FZS574 etc in the current module.

In my analyses I usually go with option A but could consider option B to be good too. Normally avoid C. The reason is that the analysis should be maintainable and somewhat agile - data is data - the analysis should be data-driven and easy to update when the dataset has slight changes.

Ty so much creanion, but I don't understand how A or B is an alternative solution, A is almost the same what I made. As far as I can tell, only C seems to be a appropriate solution. Unfortunately I didnt quite understand why to avoid C, but C is the solution for my problem, ty!!!
Another solution I found out is like: for ISIN in iShares: ETF_Vendor_df_rename = f"{ISIN}=df[ISIN]" exec(ETF_Vendor_df_rename)
Whatever works, exec usually is not so popular but for maintainability but depending on setting it can do the job. I prefer to use other constructs if I can

Collectives™ on Stack Overflow

Pandas for loop with iterrows() and naming of dataframes

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related