Pandas Dataframe merging columns

Question

I have a pandas dataframe like the following

Year  Month  Day Securtiy Trade  Value  NewDate
2011      1   10     AAPL   Buy   1500        0

My question is, how can I merge the columns Year, Month, Day into column NewDate so that the newDate column looks like the following

2011-1-10

Andy Hayden · Accepted Answer · 2013-09-22 15:46:40Z

1

The best way is to parse it when reading as csv:

In [1]: df = pd.read_csv('foo.csv', sep='\s+', parse_dates=[['Year', 'Month', 'Day']])

In [2]: df
Out[2]:
       Year_Month_Day Securtiy Trade  Value  NewDate
0 2011-01-10 00:00:00     AAPL   Buy   1500        0

You can do this without the header, by defining column names while reading:

pd.read_csv(input_file, header=['Year', 'Month', 'Day', 'Security','Trade', 'Value' ], parse_dates=[['Year', 'Month', 'Day']])

If it's already in your DataFrame, you could use an apply:

In [11]: df['Date'] = df.apply(lambda s: pd.Timestamp('%s-%s-%s' % (s['Year'], s['Month'], s['Day'])), 1)

In [12]: df
Out[12]:
   Year  Month  Day Securtiy Trade  Value  NewDate                Date
0  2011      1   10     AAPL   Buy   1500        0 2011-01-10 00:00:00

edited Sep 22, 2013 at 15:46

answered Sep 22, 2013 at 15:40

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

trinity Over a year ago

Unfortunately , my input csv files does not have headers. I had to do this add headers.df.columns = ['Year', 'Month', 'Day', 'Security','Trade','Value' ]. So , i have to reformat the dataframe to achieve the concatenation of YYYY-MM-DD into Newdate column

Andy Hayden Over a year ago

@trinity you can also do it from the position so [[0, 1, 2]], or reference by name when using the header argument of read_csv

trinity Over a year ago

You mean , df = pd.read_csv(input_file , header = None, parse_dates=[[0,1,2]]) ??

Andy Hayden Over a year ago

or pd.read_csv(input_file, header=['Year', 'Month', ...], parse_dates=[['Year', 'Month', 'Day']])

trinity Over a year ago

I tried that , and seems i am violating some syntax. The following error thrown.TypeError: cannot concatenate 'str' and 'int' objects

|

Raviteja Chirala · Accepted Answer · 2015-02-04 21:53:39Z

1

df['Year'] + '-' + df['Month'] + '-' + df['Date']

answered Feb 4, 2015 at 21:53

Raviteja Chirala

491 silver badge2 bronze badges

Comments

Alexander · Accepted Answer · 2015-05-22 20:27:29Z

0

You can create a new Timestamp as follows:

df['newDate'] = df.apply(lambda x: pd.Timestamp('{0}-{1}-{2}'
                                                .format(x.Year, x.Month, x.Day),
                                   axix=1)

>>> df
   Year  Month  Day Securtiy Trade  Value  NewDate    newDate
0  2011      1   10     AAPL   Buy   1500        0 2011-01-10

answered May 22, 2015 at 20:27

Alexander

110k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

Pandas Dataframe merging columns

3 Answers 3

8 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

Comments

Comments

Related