How to remove duplicated rows using two columns

Question

I have a data set like below

> set.seed(1)
> tmp <- data.frame(household.id = rep(1000, 6), individual.id = rep(1:3, each = 2),
                 age = rep(c(55, 52, 27), each = 2),
                 income = runif(6)*100)

> tmp
  household.id individual.id age   income
1         1000             1  55 26.55087
2         1000             1  55 37.21239
3         1000             2  52 57.28534
4         1000             2  52 90.82078
5         1000             3  27 20.16819
6         1000             3  27 89.83897

That is, the individual "1" is the father of the household "1000", "2" is the mother, and "3" is the daughter. In this case, I want to use only column 1, 3, and 5.

(i.e. I want to remove one of the duplicated rows using household.id and individual.id)

Also, I want make the mother's age, father's age, and daughter's age variables after the above work. How can I do this?

Ronak Shah · Accepted Answer · 2021-05-25 05:56:10Z

3

Do you need something like this ?

library(dplyr)
library(tidyr)

tmp %>%
  mutate(relation = recode(individual.id,  `1` = 'father', 
                           `2` = 'mother', `3` = 'daughter' )) %>%
  pivot_wider(names_from = relation, values_from = age, 
              id_cols =  household.id, values_fn = first)


#  household.id father mother daughter
#         <dbl>  <dbl>  <dbl>    <dbl>
#1         1000     55     52       27

answered May 25, 2021 at 5:56

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to remove duplicated rows using two columns

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related