2

I have a dataframe

     year month     v_0     v_1   ...      v_27    v_28    v_29    v_30
0    1919    03     0.0     0.0   ...       0.0    13.0     0.0     0.0
1    1919    04     5.0     0.0   ...       0.0     0.0     0.0 -9999.0
2    1919    05     0.0     0.0   ...       0.0     0.0     0.0     0.0

The headers are year,month and every day in the month. I want to convert the seperate headers for every day into a single column which would be the day number for every month followed by the data. It should look something like this.

Year         Month  Day  Value

1919           3     1    0

1919           3     2    0
2
  • Please post your frame as text only. Commented Oct 15, 2018 at 17:38
  • Sorry, just posted it now. Pretty new to Stack Overflow Commented Oct 15, 2018 at 17:43

2 Answers 2

2

You may need to check wide_to_long

pd.wide_to_long(df,'v',i=['year','month'],j='day',sep='_').reset_index()
Out[108]: 
    year  month day       v
0   1919      3   0     0.0
1   1919      3   1     0.0
2   1919      3  27     0.0
3   1919      3  28    13.0
4   1919      3  29     0.0
5   1919      3  30     0.0
6   1919      4   0     5.0
7   1919      4   1     0.0
8   1919      4  27     0.0
9   1919      4  28     0.0
10  1919      4  29     0.0
11  1919      4  30 -9999.0
12  1919      5   0     0.0
13  1919      5   1     0.0
14  1919      5  27     0.0
15  1919      5  28     0.0
16  1919      5  29     0.0
17  1919      5  30     0.0
Sign up to request clarification or add additional context in comments.

3 Comments

This worked as well. Though need to figure how to split to get the day number
@Chinmay by using wide to long , you do not need split it , it already split for you within the function
Yes, didn't require to split by using wide to long. Thanks!
1

You want to use DataFrame.melt():

Considering the following dataframe df:

        Year    Month   v_1     v_2
  0     1901    2       4       8
  1     1902    3       5       9
  2     1903    4       6       10
  3     1904    5       7       11

Calling df.melt(id_vars=['Year', 'Month'], var_name='Day') yields this result:

       Year     Month   Day         value
  0     1901    2       v_1             4
  1     1902    3       v_1             5
  2     1903    4       v_1             6
  3     1904    5       v_1             7
  4     1901    2       v_2             8
  5     1902    3       v_2             9
  6     1903    4       v_2             10
  7     1904    5       v_2             11

What happens here is that melt moves all data from the columns other than Year and Month (the id_vars), into a new column "value", and the column names (v_0, v_1 etc) will become a new column called "variable". We can set the name of this new column with the var_name argument, which I have set to 'Day' above.

Actually, because the names of the day columns begin with "v_", I would start by renaming these columns:

df.rename(axis='columns', mapper=lambda s: s.split('_')[-1], inplace=True)
df.melt(id_vars=['Year', 'Month'], var_name='Day')

6 Comments

Base on his data , your day become v_1, v_2..., you need to split after melt
@Wen I don't understand what you mean. My "Day" is a new column based on all the other columns (v_1, v_2 in the question). The name "Day" is taken from the var_name argument.
The data frame you are using have different columns name
This worked great. Thanks. Yes, using the above method, I would have to split to get the day number
@Chinmay: I added a different solution that starts by renaming the columns; this is faster than going over all the rows after the fact.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.