0

I'd like to summarise data by calculating the mean of values in one column conditional on the values in another column. Here's an example:

dat <- data.frame(group = c("A", "A", "A", "A", "B", "B", "B", "B"),
                  xy = c(1:4, 1:4),
                  val = 1:8)
> dat
  group xy val
1     A  1   1
2     A  2   2
3     A  3   3
4     A  4   4
5     B  1   5
6     B  2   6
7     B  3   7
8     B  4   8

The desired output is:

  group     var val
1     A mean1_2 1.5
2     A mean3_4 3.5
3     B mean1_2 5.5
4     B mean3_4 7.5

I thought about combining summarise and case_when in dplyr but that does not work (or I've not used it correctly).

dat %>%
  group_by(group) %>%
  summarise(mean1_2 = case_when(xy %in% 1:2 ~ mean(val)),
            mean3_4 = case_when(xy %in% 3:4 ~ mean(val)))
`summarise()` has grouped output by 'group'. You can override using the `.groups` argument.
# A tibble: 8 x 3
# Groups:   group [2]
  group mean1_2 mean3_4
  <chr>   <dbl>   <dbl>
1 A         2.5    NA  
2 A         2.5    NA  
3 A        NA       2.5
4 A        NA       2.5
5 B         6.5    NA  
6 B         6.5    NA  
7 B        NA       6.5
8 B        NA       6.5

Is there another way? I'd like to avoid spreading the data to wide format.

1 Answer 1

1

I'm not sure about your condition but you may try

dat %>%
  mutate(key = ceiling(xy/2)) %>%
  group_by(group, key) %>%
  summarise(var = paste0(xy, collapse = "_"),
            val = mean(val)) %>%
  mutate(var = paste0('mean',var)) %>%
  select(-key)

  group var       val
  <chr> <chr>   <dbl>
1 A     mean1_2   1.5
2 A     mean3_4   3.5
3 B     mean1_2   5.5
4 B     mean3_4   7.5
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.