Creating new columns of pandas DataFrame based on unique values?

Question

I have a DataFrame in pandas that has a date, a stock symbol (i.e. 'MSFT'), and the Open and Close and other datapoints of that stock on that particular day. Thus, there is essentially a copy of the dates for each stock symbol in my dataset.

I want to convert my DataFrame:


    Open    High    Low Close   Adj Close   Volume  Name
Date                            
2006-12-04  0.06508 0.06508 0.06508 0.06508 -0.098360   193352.0    AAIT
2006-12-05  0.06464 0.06464 0.06464 0.06464 -0.097695   81542.0 AAIT
2006-12-06  0.06596 0.06596 0.06552 0.06596 -0.099690   158115.0    AAIT
2006-12-07  0.06596 0.06596 0.06596 0.06596 -0.099690   65731.0 AAIT
2006-12-11  0.06596 0.06596 0.06596 0.06596 -0.099690   542561.0    AAIT

into something like:


    ADBE_Adj Close  ADBE_Close  ADBE_High   ADBE_Low    ADBE_Open   ADBE_Volume ADXS_Adj Close  ADXS_Close  ADXS_High   ADXS_Low    ... 
2019-12-19  327.630005  327.630005  327.959991  324.26001   324.380005  2561400.0   0.581   0.581   0.59    0.550   ...
2020-11-17  467.950012  467.950012  469.910004  460.00000   461.660004  2407600.0   0.393   0.393   0.40    0.383   ...

I'm doing it manually with the code that I wrote:

df = pd.DataFrame() # init empty dataframe
dates_set = set(stocks_df.index)
print('Going through {} days of data.'.format(len(dates_set)))
for _date in tqdm(dates_set):
    row = {}
    for symbol in filtered_stock_list:
        stock_at_date = stocks_df.loc[(stocks_df['Name']==symbol) &
                                     (stocks_df.index==_date)]
        for attribute in ['Open','High','Low','Close','Adj Close','Volume']:
            try:
                row[symbol + '_' + attribute] = float(stock_at_date[attribute])
            except Exception as e:
                row[symbol + '_' + attribute] = None
    #print(row)
    ser = pd.Series(data=row, name=_date)
    df = df.append(ser)

but unfortunately, this code is very unoptimized and will take hours to run. I've been looking at all kinds of different pandas operations, but I can't figure out how to do it.

Please have the sample output use the same data as the sample input, so what you are trying to accomplish is clear. There is no relationship of one to the other in your example. — RufusVS
– RufusVS, Commented Dec 31, 2020 at 17:15

ansev · Accepted Answer · 2020-12-31 17:16:49Z

1

Use:

new_df = (df.set_index('Name', append=True)
            .loc[:, ['Open','High','Low','Close','Adj Close','Volume']]
            .unstack('Name'))
new_df.columns = [f'{x}_{y}' for x, y in new_df.columns]

edited Dec 31, 2020 at 17:16

answered Dec 31, 2020 at 17:11

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Creating new columns of pandas DataFrame based on unique values?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related