1

I have the following data set:

 id pnum t1 t2 t3 w1 w2 w3 
  1    1  w  r  r  1  1  1  
  1    2  o  o  w  0  0  1  
  1    3  o  w  w  1  1  1    
  2    1  o  w  t  1  0  1   
  2    2  s  s  s  1  0  1    
  2    3  s  s  s  1  0  1        

       

Id defines the group membership.

Based on id and pnum I would like to identify the common measurements reported with pnum 3 at time t with respect to w.

In other words id and pnum define different individuals who took some measurements. In some cases the measurement was taken together in other cases the measurement was taken alone. If taken together than at 'w' we have value 1.

For example: 

Common activities at time t:

Id 1 pnum 1 at t1 reported (eg. 1) that a measurement was taken with from the group, more specifically with id1/pnum3. If the measurement data taken together is common in both groups I would like to save it.

Uncommon  activities at time t:

Id 2 pnum 1 at t1 reported (eg. 1) that a measurement was taken with from the group, more specifically with id2/pnum2 and pnum 3. In this case the measurement data taken together is uncommon between id2/pnum1 and pnum 2 as well between id2/pnum1 and pnum 3. I don't want to save these measurements. But, I would like to save the common one reported between id2/pnum2 and pnum 3.

Generic example id 1:

In the group with id 1 pnum1 and pnum3 at t1 took together a measurement. Pnum 1 reported w, pnum 2 and pnum 3 reported o. This means that pnum 2 and pnum 3 reported the same measurement. However, when I look at w1 I could observe that they were not together when they did as at w1 pnum 2 is 0 and pnum 3 is 1. In other words, even though the measurement are common for pnum 2 and pnum 3 as they were not taken together I don't want to save the case. I would need to report if they reported the same measurements or not. In this case pnum1 reported w while pnum 3 reported o, so the measurements don't match. Therefore I coded 0. I don't want to save the case.

I would like to identify the common measurements that were taken together at time t.

Output:

 id pnum t1   t2   t3  
  1    1  0   0    0                       
  1    2  0   0    w                    
  1    3  0   s    w                       
  2    1  0   0    0                       
  2    2  s   0    s                       
  2    3  s   0    s       

Sample data:

 df<-structure(list(id=c(1,1,1,2,2,2),pnum=c(1,2,3,1,2,3), t1=c("w","o","o","o","s","s"), t2=c("r","o","w","w","s","s"),t3 = c("r","w","w","t","s","s"), w1= c(1,0,1,1,1,1), w2 = c(1,0,1,0,0,0), w3 = c(1,1,1,1,1,1)), row.names = c(NA, 6L), class = "data.frame")
                                                                               
5
  • @Ronak thanks . basically t1..t3 are fixed they do not change. So for example at t1 all 3 individuals took some measures. Pnum 1 reported w, pnum 2 and pnum 3 reported o. This means that pnum 2 and pnum3 reported the same measurement. However, when I look at w1 I could observe that they were not together when they did as at w1 pnum 2 is 0 and pnum3 is 1. In other words, even though the measurement are common for pnum 2 and pnum3 as they were not taken together I don't want to save the case. Thanks. Commented Sep 11, 2020 at 5:58
  • What have you tried so far? Commented Sep 11, 2020 at 9:07
  • @jay.sf thanks created 2 separate data frames one for t1..t3 and one for w1..w3. Than I merged the two based on index and pnum and I set a condition using ifelse function. Commented Sep 11, 2020 at 9:08
  • My question implied that you might want to show the code. Commented Sep 11, 2020 at 9:09
  • @jay.sf thanks I will adopt the code to this example and post it. Commented Sep 11, 2020 at 9:13

1 Answer 1

1

I don't get why in your expected output there is an "s" at [id1, pnum3, t2] - other than that, I think the following might help you:

First of all, pivoting your data to a "longer" format, where you can group by time, helps you to generalize your code.

library(dplyr)
library(tidyr)

df_longer <- df %>%
  pivot_longer(
    cols = matches("^[tw]\\d+$"),
    names_to = c(".value","time"),
    names_pattern = "([tw])(\\d+)"
  )

The above pivots your data to look like this:

> head(df_longer)
# A tibble: 6 x 5
     id  pnum time  t         w
  <dbl> <dbl> <chr> <chr> <dbl>
1     1     1 1     w         1
2     1     1 2     r         1
3     1     1 3     r         1
4     1     2 1     o         0
5     1     2 2     o         0
6     1     2 3     w         1

Now, you can easily group it up and identify those individuals that have given common answers at any given time:

common_answers <- df_longer %>%
  arrange(id, time, pnum) %>%
  filter(w == 1) %>% # throw out if the answer was given individually
  select(-w) %>%  # w not needed anymore
  group_by(id, time, t) %>% # group by selected answer
  filter(n() > 1) %>% # keep only answers given >1 times
  ungroup()

This presents you with only only a filtered set of your data where answers were given commonly in group:

> common_answers
# A tibble: 6 x 4
     id  pnum time  t    
  <dbl> <dbl> <chr> <chr>
1     1     2 3     w    
2     1     3 3     w    
3     2     2 1     s    
4     2     3 1     s    
5     2     2 3     s    
6     2     3 3     s

// ADDITION:
In case you have to rely on the "wide" format in your output, you can obviously retain all data, modify t so that it only retains its value when it is given commonly by >1 subject and then widen your df again:

common_answers_wide <- df_longer %>%
  group_by(id, time, w, t) %>%
  mutate(
    # retain t only when the response has been given by >1 subject
    t = case_when(
      w == 0 ~ "0",
      n() > 1 ~ t,
      T ~ "0"
    )
  ) %>%
  ungroup() %>%
  select(-w) %>%
  pivot_wider(
    names_from = time, names_prefix = "t", names_sort = T,
    values_from = t
  )

That gives you exactly the desired output:

> common_answers_wide
# A tibble: 6 x 5
     id  pnum t1    t2    t3   
  <dbl> <dbl> <chr> <chr> <chr>
1     1     1 0     0     0    
2     1     2 0     0     w    
3     1     3 0     0     w    
4     2     1 0     0     0    
5     2     2 s     0     s    
6     2     3 s     0     s
Sign up to request clarification or add additional context in comments.

1 Comment

@Rstudent was it necessary for you to retain the "wide" format you showed in your expected outcome example? (I'll add a way in how to get to that as well, just for good measure)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.