0

I have a data set like below

> set.seed(1)
> tmp <- data.frame(household.id = rep(1000, 6), individual.id = rep(1:3, each = 2),
                 age = rep(c(55, 52, 27), each = 2),
                 income = runif(6)*100)

> tmp
  household.id individual.id age   income
1         1000             1  55 26.55087
2         1000             1  55 37.21239
3         1000             2  52 57.28534
4         1000             2  52 90.82078
5         1000             3  27 20.16819
6         1000             3  27 89.83897

That is, the individual "1" is the father of the household "1000", "2" is the mother, and "3" is the daughter. In this case, I want to use only column 1, 3, and 5.

(i.e. I want to remove one of the duplicated rows using household.id and individual.id)

Also, I want make the mother's age, father's age, and daughter's age variables after the above work. How can I do this?

0

1 Answer 1

3

Do you need something like this ?

library(dplyr)
library(tidyr)

tmp %>%
  mutate(relation = recode(individual.id,  `1` = 'father', 
                           `2` = 'mother', `3` = 'daughter' )) %>%
  pivot_wider(names_from = relation, values_from = age, 
              id_cols =  household.id, values_fn = first)


#  household.id father mother daughter
#         <dbl>  <dbl>  <dbl>    <dbl>
#1         1000     55     52       27
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.