0

I have a data.frame which consists of 33 variables with 2.54 million observations. I created a simple data.frame from which I will explain my problem.

testdf
     borrower amount income
1        a   4000  30000
2        b   5000  20000
3        a   3000  30000

str(testdf)
'data.frame':   3 obs. of  3 variables:
 $ borrower: Factor w/ 2 levels "a","b": 1 2 1
 $ amount  : num  4000 20000 3000
 $ income  : num  30000 20000 30000

What I want, is to sum the variable amount if the variable borrower is the same. But this must not be done for the variable income. And after this, the duplicate must be deleted. So in this case, row 3 must be deleted. The new df has to look like this:

testdf
     borrower  amount  income
     a          7000    30000
     b          5000    20000

It is also possible that a value in variable borrower (a for instance) occurs 8 times. In that matter, I want to sum the 8 amountss but again, not the income. And delete 7 rows with a.

1
  • Does income remain same for the borrower across the 8 rows? If not, how are you selecting the value for income in your resultant dataset? Commented Nov 18, 2013 at 14:12

1 Answer 1

0

Here's a solution with plyr:

testdf <- data.frame(borrower = c("a", "b", "a"),
                     amount = c(4000, 5000, 3000),
                     income = c(30000, 20000, 30000))


library(plyr)
ddply(testdf, .(borrower), summarise, amount = sum(amount), income = income[1])

#   borrower amount income
# 1        a   7000  30000
# 2        b   5000  20000
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.