2

I have a dataframe with information on individual id, period and code of work place. I would like to know who are the individuals who have worked alone for the entire time span of the dataset.

Consider the very simple example below. Individual A worked alone at two work places (x,y) in period 1. Individual B and C worked together at work place z in period 1. Individual B worked alone at work place w in period 2. Individual D worked alone at place k in period 2.

mydf <- data.frame(id=c('A','A','B','C','B','D'),
                   period=c(1,1,1,1,2,2),
                   work_place=c('x','y','z','z','w','k'))

I would like to identify the rows concerning those who have worked alone for the entire period, which in this case are those referring individuals A and D.

ids_alone <- data.frame(id=c('A','A','D'),
                        period=c(1,1,2),
                        work_place=c('x','y','k'))

2 Answers 2

1

Grouped by 'period', 'work_place', create a column 'n' with the number of distinct 'id's, then grouped by 'id', filter those 'id's having all elements of 'n' as 1

library(dplyr)
mydf %>%
  group_by(period, work_place) %>% 
  mutate(n = n_distinct(id)) %>%
  group_by(id) %>% 
  filter(all(n ==1)) %>%
  ungroup %>%
  select(-n)

-output

# A tibble: 3 x 3
#  id    period work_place
#  <chr>  <dbl> <chr>     
#1 A          1 x         
#2 A          1 y         
#3 D          2 k         
Sign up to request clarification or add additional context in comments.

Comments

1

A data.table option (following the same idea from @akrun)

setDT(mydf)[
  ,
  n := uniqueN(id),
  .(period, work_place)
][
  ,
  .SD[mean(n) == 1], id
][
  ,
  n := NULL
][]

which gives

   id period work_place
1:  A      1          x
2:  A      1          y
3:  D      2          k

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.