1

I have a pandas dataframe like the following

Year  Month  Day Securtiy Trade  Value  NewDate
2011      1   10     AAPL   Buy   1500        0

My question is, how can I merge the columns Year, Month, Day into column NewDate so that the newDate column looks like the following

2011-1-10

3 Answers 3

1

The best way is to parse it when reading as csv:

In [1]: df = pd.read_csv('foo.csv', sep='\s+', parse_dates=[['Year', 'Month', 'Day']])

In [2]: df
Out[2]:
       Year_Month_Day Securtiy Trade  Value  NewDate
0 2011-01-10 00:00:00     AAPL   Buy   1500        0

You can do this without the header, by defining column names while reading:

pd.read_csv(input_file, header=['Year', 'Month', 'Day', 'Security','Trade', 'Value' ], parse_dates=[['Year', 'Month', 'Day']])

If it's already in your DataFrame, you could use an apply:

In [11]: df['Date'] = df.apply(lambda s: pd.Timestamp('%s-%s-%s' % (s['Year'], s['Month'], s['Day'])), 1)

In [12]: df
Out[12]:
   Year  Month  Day Securtiy Trade  Value  NewDate                Date
0  2011      1   10     AAPL   Buy   1500        0 2011-01-10 00:00:00
Sign up to request clarification or add additional context in comments.

8 Comments

Unfortunately , my input csv files does not have headers. I had to do this add headers.df.columns = ['Year', 'Month', 'Day', 'Security','Trade','Value' ]. So , i have to reformat the dataframe to achieve the concatenation of YYYY-MM-DD into Newdate column
@trinity you can also do it from the position so [[0, 1, 2]], or reference by name when using the header argument of read_csv
You mean , df = pd.read_csv(input_file , header = None, parse_dates=[[0,1,2]]) ??
or pd.read_csv(input_file, header=['Year', 'Month', ...], parse_dates=[['Year', 'Month', 'Day']])
I tried that , and seems i am violating some syntax. The following error thrown.TypeError: cannot concatenate 'str' and 'int' objects
|
1

df['Year'] + '-' + df['Month'] + '-' + df['Date']

Comments

0

You can create a new Timestamp as follows:

df['newDate'] = df.apply(lambda x: pd.Timestamp('{0}-{1}-{2}'
                                                .format(x.Year, x.Month, x.Day),
                                   axix=1)

>>> df
   Year  Month  Day Securtiy Trade  Value  NewDate    newDate
0  2011      1   10     AAPL   Buy   1500        0 2011-01-10

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.