I have a data.frame which consists of 33 variables with 2.54 million observations. I created a simple data.frame from which I will explain my problem.
testdf
borrower amount income
1 a 4000 30000
2 b 5000 20000
3 a 3000 30000
str(testdf)
'data.frame': 3 obs. of 3 variables:
$ borrower: Factor w/ 2 levels "a","b": 1 2 1
$ amount : num 4000 20000 3000
$ income : num 30000 20000 30000
What I want, is to sum the variable amount if the variable borrower is the same. But this must not be done for the variable income. And after this, the duplicate must be deleted. So in this case, row 3 must be deleted. The new df has to look like this:
testdf
borrower amount income
a 7000 30000
b 5000 20000
It is also possible that a value in variable borrower (a for instance) occurs 8 times. In that matter, I want to sum the 8 amountss but again, not the income. And delete 7 rows with a.
incomeremain same for theborroweracross the 8 rows? If not, how are you selecting the value forincomein your resultant dataset?