1

I have a DataFrame in pandas that has a date, a stock symbol (i.e. 'MSFT'), and the Open and Close and other datapoints of that stock on that particular day. Thus, there is essentially a copy of the dates for each stock symbol in my dataset.

I want to convert my DataFrame:


    Open    High    Low Close   Adj Close   Volume  Name
Date                            
2006-12-04  0.06508 0.06508 0.06508 0.06508 -0.098360   193352.0    AAIT
2006-12-05  0.06464 0.06464 0.06464 0.06464 -0.097695   81542.0 AAIT
2006-12-06  0.06596 0.06596 0.06552 0.06596 -0.099690   158115.0    AAIT
2006-12-07  0.06596 0.06596 0.06596 0.06596 -0.099690   65731.0 AAIT
2006-12-11  0.06596 0.06596 0.06596 0.06596 -0.099690   542561.0    AAIT

into something like:


    ADBE_Adj Close  ADBE_Close  ADBE_High   ADBE_Low    ADBE_Open   ADBE_Volume ADXS_Adj Close  ADXS_Close  ADXS_High   ADXS_Low    ... 
2019-12-19  327.630005  327.630005  327.959991  324.26001   324.380005  2561400.0   0.581   0.581   0.59    0.550   ...
2020-11-17  467.950012  467.950012  469.910004  460.00000   461.660004  2407600.0   0.393   0.393   0.40    0.383   ...

I'm doing it manually with the code that I wrote:

df = pd.DataFrame() # init empty dataframe
dates_set = set(stocks_df.index)
print('Going through {} days of data.'.format(len(dates_set)))
for _date in tqdm(dates_set):
    row = {}
    for symbol in filtered_stock_list:
        stock_at_date = stocks_df.loc[(stocks_df['Name']==symbol) &
                                     (stocks_df.index==_date)]
        for attribute in ['Open','High','Low','Close','Adj Close','Volume']:
            try:
                row[symbol + '_' + attribute] = float(stock_at_date[attribute])
            except Exception as e:
                row[symbol + '_' + attribute] = None
    #print(row)
    ser = pd.Series(data=row, name=_date)
    df = df.append(ser)

but unfortunately, this code is very unoptimized and will take hours to run. I've been looking at all kinds of different pandas operations, but I can't figure out how to do it.

1
  • 1
    Please have the sample output use the same data as the sample input, so what you are trying to accomplish is clear. There is no relationship of one to the other in your example. Commented Dec 31, 2020 at 17:15

1 Answer 1

1

Use:

new_df = (df.set_index('Name', append=True)
            .loc[:, ['Open','High','Low','Close','Adj Close','Volume']]
            .unstack('Name'))
new_df.columns = [f'{x}_{y}' for x, y in new_df.columns]
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.