0

I have a dataframe with a date column. These dates represent the date that a particular poll result was actually taken. However, the website takes these results and adds them to a table not necessarily on the date of the poll taking. So for example:

  • 20/01/2018
  • 21/01/2018
  • 20/01/2018
  • 19/01/2018

so the date at the top (20/01/2018) came in after the ones below. But the poll below says 21st and thats the date that the poll was taken so the earliest date that the one above could have been added is the 21st thus the list becomes;

  • 21/01/2018
  • 21/01/2018
  • 20/01/2018
  • 19/01/2018

and now my column is sorted. I need to do this for like 50 variables! Suggestions? I want to sort my dates column such that if i go from bottom to top of the column if a date has a later date below it, then that date becomes that later date too. enter image description here

8
  • Can you run str(the_name_of_your_data_frame) and paste the output into the answer? (I can make it a code block if your editor can't perform the 4-space indent) Commented Nov 11, 2018 at 20:34
  • 'data.frame': 574 obs. of 5 variables: $ Poll : chr "NBC News/Wall St. JrnlNBC/WSJ" "CNNCNN" "Rasmussen ReportsRasmussen" "GallupGallup" ... $ Approve : num 46 41 50 40 44 40 44 41 44 45 ... $ Disapprove: num 52 57 49 54 52 53 52 53 56 54 ... $ Spread : chr "-6" "-16" "+1" "-14" ... $ Date : POSIXct, format: "2018-11-03" "2018-11-03" ... > Commented Nov 11, 2018 at 20:37
  • Far better to run dput(the_name_of_your_data). Easier to reproduce that way Commented Nov 11, 2018 at 20:37
  • @ConorNeilson I suggested that in the deleted version of this question. Commented Nov 11, 2018 at 20:39
  • So, Date is an actual POSIXct date. name_of_your_df <- name_of_your_df[order(name_of_your_df$Date),] Commented Nov 11, 2018 at 20:41

1 Answer 1

1

Maybe there is a prettier way, but this should give the desired output:

data$Date <- as.POSIXct(rev(cummax(rev(as.numeric(data$Date)))), origin = "1970-01-01")

The idea is that you want a rolling maximum from the bottom up, for example once the 2018-01-02 was reached, the rows above can not have a date that is "smaller" than the 2018-01-02. This is done by the cummax function. It carries the maximum date reached and overwrites earlier/smaller dates. Since you want it to go from the bottom up, you have to reverse your date column via rev and then reverse it back after your call of cummax. Because cummax only works for numeric input I transformed your date column to numeric and back to date in the end.

Sign up to request clarification or add additional context in comments.

3 Comments

wait it went funny haha
What is your output? Did you change date to the name of your dataframe?
Ya I got it I had to state the timezone. Thanks so much !

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.