1

I want to delete events that are not following a date sequence as it shown in dataframe A. For example, events 2 and 4 are not having a sequence in date, there are some days missing. Therefore I would like to assign NAs to these events (As it is in the desire_output column), or just delete them from the dataframe.

Event<-c(1,1,1,2,2,2,2,3,3,3,4,4,4,4)
Dates<- as.Date(c("2018-10-22", "2018-10-23", "2018-10-24", "2019-01-03", "2019-01-04", "2019-01-06", "2019-01-07", "2019-05-11", "2019-05-12", "2019-05-13", "2020-02-21", "2020-02-23", "2020-02-27", "2020-02-28"))
Desire_output<- as.Date(c("2018-10-22", "2018-10-23", "2018-10-24", NA, NA, NA, NA, "2019-05-11", "2019-05-12", "2019-05-13", NA, NA, NA, NA)) 
A<- data.frame(Event, Dates, Desire_output)
   Event      Dates Desire_output
1      1 2018-10-22    2018-10-22
2      1 2018-10-23    2018-10-23
3      1 2018-10-24    2018-10-24
4      2 2019-01-03          <NA>
5      2 2019-01-04          <NA>
6      2 2019-01-06          <NA>
7      2 2019-01-07          <NA>
8      3 2019-05-11    2019-05-11
9      3 2019-05-12    2019-05-12
10     3 2019-05-13    2019-05-13
11     4 2020-02-21          <NA>
12     4 2020-02-23          <NA>
13     4 2020-02-27          <NA>
14     4 2020-02-28          <NA>

Or just delete them from the dataframe:

   Event      Dates Desire_output
1      1 2018-10-22    2018-10-22
2      1 2018-10-23    2018-10-23
3      1 2018-10-24    2018-10-24
8      3 2019-05-11    2019-05-11
9      3 2019-05-12    2019-05-12
10     3 2019-05-13    2019-05-13

Any good idea to approach this problem?

2
  • Either na.omit(A) or subset(A, !is.na(Desire_output)) or dplyr::filter(A, !is.na(Desire_output)) Commented May 11, 2020 at 22:07
  • 1
    No, but the desire_output is an example of how I would like it to have it. Is not in the real dataframe. Commented May 11, 2020 at 22:08

3 Answers 3

2

We can do a group by filter. Grouped by 'Event' check if all the difference of 'Dates' are equal to 1 to filter those groups

library(dplyr)
A  %>% 
    group_by(Event) %>% 
    filter(all(diff(Dates) == 1))
    # or with difference between lead and current element
    #filter(all((lead(Dates, default = last(Dates)) - Dates) <2))

Or with base R

i1 <- with(A, as.logical(ave(as.numeric(Dates), Event,
      FUN = function(x)  all(diff(x) == 1))))
A[i1,]
Sign up to request clarification or add additional context in comments.

Comments

1

We can subtract the current date with the previous date and check if all the values for each Event is less than equal to 1. Using dplyr :

library(dplyr)

A %>%
  group_by(Event) %>%
  filter(all(Dates - lag(Dates, default = first(Dates)) <= 1))

#  Event Dates      Desire_output
#  <dbl> <date>     <date>       
#1     1 2018-10-22 2018-10-22   
#2     1 2018-10-23 2018-10-23   
#3     1 2018-10-24 2018-10-24   
#4     3 2019-05-11 2019-05-11   
#5     3 2019-05-12 2019-05-12   
#6     3 2019-05-13 2019-05-13   

and the same logic in data.table :

library(data.table)
setDT(A)[, .SD[all(Dates - shift(Dates, fill = first(Dates)) <= 1)], Event]

Comments

1

additional option

df %>% 
  group_by(Event) %>% 
  filter(is.na(any((Dates - lag(Dates))>1)))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.