0

I want to iterate through the rows in pandas and create a new column based on the value. I have my data set here:

  Political Entity  Recipient ID           Recipient Recipient last name  \
0       Candidates          4350       Whelan, Susan              Whelan   
1       Candidates          4350       Whelan, Susan              Whelan   
2       Candidates          4350       Whelan, Susan              Whelan   
3       Candidates          4350       Whelan, Susan              Whelan   
4       Candidates         15453  Mastroianni, Steve         Mastroianni   

  Recipient first name Recipient middle initial Political Party of Recipient  \
0                Susan                      NaN      Liberal Party of Canada   
1                Susan                      NaN      Liberal Party of Canada   
2                Susan                      NaN      Liberal Party of Canada   
3                Susan                      NaN      Liberal Party of Canada   
4                Steve                      NaN      Liberal Party of Canada   

  Electoral District        Electoral event Fiscal/Election date  \
0              Essex  38th general election           2004-06-28   
1              Essex  38th general election           2004-06-28   
2              Essex  38th general election           2004-06-28   
3              Essex  38th general election           2004-06-28   
4  Windsor--Tecumseh  40th general election           2008-10-14   

        ...       Monetary amount Non-Monetary amount  \
0       ...                 800.0                 0.0   
1       ...                1280.0                 0.0   
2       ...                 250.0                 0.0   
3       ...                1000.0                 0.0   
4       ...                 800.0                 0.0   

I want to create a new column where it takes the political party and the year date and add the Monetary value. For example:

+------------------------------+----------------------------+--+--+--+
| 2004 Liberal Party of Canada | 2004 Green Party of Canada |  |  |  |
+------------------------------+----------------------------+--+--+--+
| 8000                         | 0                          |  |  |  |
+------------------------------+----------------------------+--+--+--+
|                              |                            |  |  |  |
+------------------------------+----------------------------+--+--+--+
|                              |                            |  |  |  |
+------------------------------+----------------------------+--+--+--+

I have created a couple of functions to help get started:

def year_political_column(row):
    return row['Fiscal/Election date'][:4] + ' ' + row['Political Party of Recipient']


def monetary(row):
    return row['Monetary amount']

Whenever I look up my solution it seems like you have to already have the column set. Can anyone lead me in the right direction?

Sample output should be:

  Political Entity  Recipient ID           Recipient Recipient last name  \
0       Candidates          4350       Whelan, Susan              Whelan   
1       Candidates          4350       Whelan, Susan              Whelan   
2       Candidates          4350       Whelan, Susan              Whelan   
3       Candidates          4350       Whelan, Susan              Whelan   
4       Candidates         15453  Mastroianni, Steve         Mastroianni   

  Recipient first name Recipient middle initial Political Party of Recipient  \
0                Susan                      NaN      Liberal Party of Canada   
1                Susan                      NaN      Liberal Party of Canada   
2                Susan                      NaN      Liberal Party of Canada   
3                Susan                      NaN      Liberal Party of Canada   
4                Steve                      NaN      Liberal Party of Canada   

  Electoral District        Electoral event Fiscal/Election date  \
0              Essex  38th general election           2004-06-28   
1              Essex  38th general election           2004-06-28   
2              Essex  38th general election           2004-06-28   
3              Essex  38th general election           2004-06-28   
4  Windsor--Tecumseh  40th general election           2008-10-14   

        ...       Monetary amount Non-Monetary amount  \
0       ...                 800.0                 0.0   
1       ...                1280.0                 0.0   
2       ...                 250.0                 0.0   
3       ...                1000.0                 0.0   
4       ...                 800.0                 0.0   

  Contribution given through Ontario first name Ontario last name  \
0                        NaN                J M            
1                        NaN                  J             
2                        NaN                  B            
3                        NaN                  H            
4                        NaN                  H            

   Ontario Address Ontario city Ontario Province Ontario Postal Code  \
0                

  Ontario Phone #  
0      
1      
2      
3      
4      

With all the political data I am looking for attached on the right.

5
  • 1
    It sounds like you want to create your year-party column and then do something like crosstab or pivot Commented Jul 12, 2018 at 15:49
  • How can I dynamically create these columns? @user3483203 Commented Jul 12, 2018 at 16:18
  • I think you can do a groupby: df.groupby('Political Party of Recipient')['Monetary amount'].sum(), and then use transpose: pandas.pydata.org/pandas-docs/stable/generated/… Commented Jul 12, 2018 at 16:20
  • How am I able to do this by year as well? As in have all years and political parties from 2004 up to 2018? @xyzjayne Commented Jul 12, 2018 at 16:25
  • See my answer -- I created a 'year_political' column like yours (except it doesn't have to be done with a function, you can just add columns), and then I did a groupby based on this column. Commented Jul 12, 2018 at 16:32

2 Answers 2

1

This can be accomplished via a variety of ways:

  • pivot
  • pivot_table
  • groupby

However, most of them will need some brushing to output the format you need. Only number 2 will work if you are not looking for an aggregate function and would like the entries.

def column_name(row):
    return '{} {}'.format(row['Fiscal/Election date'].year, row['initial Political Party of Recipient'])

df['Fiscal/Election date'] = pd.to_datetime(df['Fiscal/Election date'])

df['Column Name'] = df.apply(column_name, axis=1)

1) pivot_table

In [4]: df[['Column Name', 'Monetary amount']].pivot_table(columns='Column Name'
   ...: , 
   ...:                                                    values='Monetary amou
   ...: nt', 
   ...:                                                    aggfunc='sum')
   ...:                                                    
Out[4]: 
Column Name      2004 Liberal Party of Canada  2008 Liberal Party of Canada
Monetary amount                          3330                           800

2) pivot

In [5]: (df[['Column Name', 'Monetary amount']]
   ...: .pivot(columns='Column Name', values='Monetary amount'))
Out[5]: 
Column Name  2004 Liberal Party of Canada  2008 Liberal Party of Canada
0                                   800.0                           NaN
1                                  1280.0                           NaN
2                                   250.0                           NaN
3                                  1000.0                           NaN
4                                     NaN                         800.0

3) groupby

In [6]: pd.DataFrame(df.groupby('Column Name')['Monetary amount'].sum()).transpo
   ...: se()
Out[6]: 
Column Name      2004 Liberal Party of Canada  2008 Liberal Party of Canada
Monetary amount                          3330                           800
Sign up to request clarification or add additional context in comments.

9 Comments

Nice summary. pd.to_datetime could be time-consuming for larger datasets though. Using a 4-char slice is probably good enough for extracting year.
Perhaps you are right, it's more of a habit to deal with dates as datetime objects rather than strings, can't help not do it.
Wow... absolutely amazing detailed answers from both you and @xyzjayne. I ended up using the pivot method. That being said, how do you keep the remaining column that I had previously?
What do you mean by remaining column?
I fixed this issue by just doing a merge on the index of both dataframes. This worked. Thanks!
|
1

Create a column using election year and party name, then do a groupby and transpose:

df['year_political'] = df['Fiscal/Election date'].astype(str).str.slice(0,4) + ' '+ df['Political Party of Recipient']
df.groupby('year_political')['Monetary amount'].sum().reset_index().transpose()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.