Sum certain variables and delete duplicates after

Question

I have a data.frame which consists of 33 variables with 2.54 million observations. I created a simple data.frame from which I will explain my problem.

testdf
     borrower amount income
1        a   4000  30000
2        b   5000  20000
3        a   3000  30000

str(testdf)
'data.frame':   3 obs. of  3 variables:
 $ borrower: Factor w/ 2 levels "a","b": 1 2 1
 $ amount  : num  4000 20000 3000
 $ income  : num  30000 20000 30000

What I want, is to sum the variable amount if the variable borrower is the same. But this must not be done for the variable income. And after this, the duplicate must be deleted. So in this case, row 3 must be deleted. The new df has to look like this:

testdf
     borrower  amount  income
     a          7000    30000
     b          5000    20000

It is also possible that a value in variable borrower (a for instance) occurs 8 times. In that matter, I want to sum the 8 amountss but again, not the income. And delete 7 rows with a.

Does income remain same for the borrower across the 8 rows? If not, how are you selecting the value for income in your resultant dataset? — TheComeOnMan
– TheComeOnMan, Commented Nov 18, 2013 at 14:12

Sven Hohenstein · Accepted Answer · 2013-11-18 14:22:59Z

0

Here's a solution with plyr:

testdf <- data.frame(borrower = c("a", "b", "a"),
                     amount = c(4000, 5000, 3000),
                     income = c(30000, 20000, 30000))


library(plyr)
ddply(testdf, .(borrower), summarise, amount = sum(amount), income = income[1])

#   borrower amount income
# 1        a   7000  30000
# 2        b   5000  20000

edited Nov 18, 2013 at 14:22

answered Nov 18, 2013 at 14:17

Sven Hohenstein

82k17 gold badges150 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Sum certain variables and delete duplicates after

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related