0

Disclaimer: I am going to come out of this looking silly.

I have a data frame containing a column which has a date of class POSIXct. I am trying to remove some of the rows containing specific dates- public holidays. I tried to do that using this:

> modelset.nonholiday <- modelset[!modelset$date == as.POSIXct("2013-12-31")| !modelset$date ==as.POSIXct("2013-07-04") | !modelset$date == as.POSIXct("2014-07-04")| !modelset$date == as.POSIXct ("2013-11-28") | !modelset$date == as.POSIXct ("2013-11-29") | !modelset$date == as.POSIXct ("2013-12-24") | !modelset$date == as.POSIXct ("2013-12-25") | !modelset$date == as.POSIXct ("2014-02-14") | !modelset$date == as.POSIXct ("2014-04-20") | !modelset$date == as.POSIXct ("2014-05-26"), ]

The above didn't work. It returns the data frame removing only the first So I tried :

modelset[!modelset$date %in% c("2013-12-31", "2013-07-04", "2014-07-04",
             "2013-11-28", "2013-11-29", "2013-12-24", "2013-12-25", "2014-02-14", 
             "2014-04-20", "2014-05-26"), ]

This didn't work either. I also tried:

`%notin%` <- function(x,y) !(x %in% y) 

modelset[modelset$date %notin% as.POSIXct(c("2013-12-31", "2013-07-04", "2014-07-04",
                 "2013-11-28", "2013-11-29", "2013-12-24", "2013-12-25", "2014-02-14",
                 "2014-04-20", "2014-05-26")), ]`

I've referred Remove Rows From Data Frame where a Row match a String, R remove rows containing a certain value, and Standard way to remove multiple elements from a dataframe but can't seem to find what I am doing wrong.

> head(modelset)
    date spot.volume.loc spot.volume.nat nat.imp.a loc.imp.a nat.imp.m loc.imp.m branded.leads esi.leads
1 2013-07-01            2988             215     13931    4155.3      5770    1853.7           331       363
2 2013-07-02            3200             218     12589    4651.3      5374    2207.8           293       428
3 2013-07-03            3066             203     10305    3921.0      4754    1759.2           273       325
4 2013-07-04            3153              83      2353    4135.6       999    1912.2           172       184
5 2013-07-05            2959              59      1553    3573.4       815    1662.3           193       246
6 2013-07-06             667              53      2219     456.7       889     214.8           161       203
tv.leads callin.leads total.leads total.imp.a total.imp.m       day week quarter on.off
1      195           41         930     18086.3      7623.7    Monday   26      Q3   1.25
2      192           50         963     17240.3      7581.8   Tuesday   26      Q3   1.00
3      149           38         785     14226.0      6513.2 Wednesday   26      Q3   1.00
4       34            0         390      6488.6      2911.2  Thursday   26      Q3   1.00
5       50           18         507      5126.4      2477.3    Friday   26      Q3   0.75
6       14            9         387      2675.7      1103.8  Saturday   26      Q3   0.50
5
  • You could try modelset[!as.Date(modelset$date) %in% as.Date(c("2013-12-31", "2013-07-04", "2014-07-04","2013-11-28", "2013-11-29", "2013-12-24", "2013-12-25", "2014-02-14", "2014-04-20", "2014-05-26")), ] (so you make sure that both model$date and the date vector are in the same format) Commented Nov 21, 2014 at 18:22
  • uncanny but still not working ! Returns the same exact data.frame ! Commented Nov 21, 2014 at 18:33
  • Then consider showing a small part of your data.frame so others can reproduce the problem. Commented Nov 21, 2014 at 18:35
  • sorry, it's working. i was missing a comma. The problem was with difference in model$date and date vector as correctly identified. Thanks ! Commented Nov 21, 2014 at 18:40
  • no idea why this question has been down-voted . . the problem was with the date vector and column value not being in same format. I didn't know this concept and have learnt it now. Commented Nov 22, 2014 at 16:54

2 Answers 2

2

For an answer using dplyr and using your %notin% approach, you also have:

library(dplyr)

dates <- 
  as.POSIXct(c("2013-12-31", "2013-07-04", "2014-07-04", "2013-11-28", "2013-11-29", 
               "2013-12-24", "2013-12-25", "2014-02-14", "2014-04-20", "2014-05-26"))

`%notin%` <- function(x,y) !(x %in% y) 

modelset %>%
  filter(date %notin% dates)
Sign up to request clarification or add additional context in comments.

Comments

1

Use the which statement like so:

dat <- as.POSIXct(c("2013-12-31", "2013-07-04", "2014-07-04",
                                         "2013-11-28", "2013-11-29", "2013-12-24", "2013-12-25", "2014-02-14", 
                                         "2014-04-20", "2014-05-26"))

dat[which(dat != as.POSIXct(c("2013-12-31", "2014-07-04")))]

In your case, I believe it would be:

modelset <- modelset[which(!modelset$date %in% c("2013-12-31", "2013-07-04", "2014-07-04",
         "2013-11-28", "2013-11-29", "2013-12-24", "2013-12-25", "2014-02-14", 
         "2014-04-20", "2014-05-26"))]

What the which statement does is return row numbers where it's evaluated to be true. Then having it inside the brackets, it specifies those row numbers as the only ones to show.

1 Comment

It's also possible to subset a data.frame by logicals, so which would be possible, but not necessary.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.