1

Goal: Transform raw data pulled from EuroStat via Pandas DataReader and reshape the data such that it has a Pandas DateTime object as the index and countries across as columns.

Code:

import pandas as pd
import pandas_datareader as web  
import datetime
start = datetime.datetime(1900,1,1)
end = datetime.date.today()
df2 = web.DataReader('tipsii20', 'eurostat', start = start,end = end)
df2.columns

looking at the columns, we can see that we are working with a MultiIndex

MultiIndex(levels=[[u'Rest of the world'], [u'Net liabilities (liabilities minus assets)'], [u'Net external debt'], [u'Percentage of gross domestic product (GDP)'], [u'Unadjusted data (i.e. neither seasonally adjusted nor calendar adjusted data)'], [u'Austria', u'Belgium', u'Bulgaria', u'Croatia', u'Cyprus', u'Czech Republic', u'Denmark', u'Estonia', u'Finland', u'France', u'Germany (until 1990 former territory of the FRG)', u'Greece', u'Hungary', u'Ireland', u'Italy', u'Latvia', u'Lithuania', u'Luxembourg', u'Malta', u'Netherlands', u'Poland', u'Portugal', u'Romania', u'Slovakia', u'Slovenia', u'Spain', u'Sweden', u'United Kingdom'], [u'Annual']], labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 4, 5, 10, 6, 7, 11, 25, 8, 9, 3, 12, 13, 14, 16, 17, 15, 18, 19, 20, 21, 22, 26, 24, 23, 27], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], names=[u'PARTNER', u'STK_FLOW', u'BOP_ITEM', u'UNIT', u'S_ADJ', u'GEO', u'FREQ'])

I would like to transform this dataset so that it maintains its DateTime index, but uses names['GEO'] as the columns. Should this be df2.xs?

2
  • What is start and end ? Commented Oct 27, 2017 at 12:04
  • Thanks, just added the start and end objects Commented Oct 27, 2017 at 12:06

2 Answers 2

2

You can use droplevel:

df2.columns = df2.columns.droplevel([0,1,2,3,4,6])

Another solution if know level name similar as Bharath shetty' solution:

df2.columns =  df2.columns.get_level_values('GEO')
Sign up to request clarification or add additional context in comments.

4 Comments

This is so simple
I hate the documentation now they say it should be int but can also take strings?
Yes, it can be string or int, all are valid values.
It said int pandas.pydata.org/pandas-docs/stable/generated/…. Can you edit the documentation
2

Use pd.DataFrame with get_level_values(5) since GEO is in fifth level for columns incase you want to preserve the dataframe for future reference i.e

ndf = pd.DataFrame(df2.values,df2.index,df2.columns.get_level_values(5))

Or assign the columns by getting level values like

df2.columns =  df2.columns.get_level_values(5)

Output :

print(ndf.head().iloc[:,:4])

GEO          Austria  Belgium  Bulgaria  Cyprus
TIME_PERIOD                                    
2010-01-01      28.0   -121.2      37.1    70.9
2011-01-01      24.0   -118.8      29.6   127.1
2012-01-01      25.8   -102.7      25.4   137.2
2013-01-01      20.1    -88.4      21.6   140.0
2014-01-01      20.0    -71.1      18.3   136.1

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.