Sorting names() based on dates

Question

Im trying to sort columns for individual patients based on dates in those columns in R. I made an example data set, however, the data set does not return dates, but long numbers (no idea why). Forgive my perhaps silly way of creating the data frame :)...

dd<- 
data.frame(rbind(
c(as.POSIXct(as.Date("01/01/2008", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2009", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2011", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2010", format="%d/%m/%Y")))
,
c(as.POSIXct(as.Date("01/01/2002", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2001", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2006", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2004", format="%d/%m/%Y")))
))
dd$patient[1] <- 1
dd$patient[2] <- 2
names(dd) <- c("date1", "date2", "date3", "date4", "patient")

What I am after is a list of colum names per patient, sorted by dates within those columns. Thus,

Patient 1 : date1, date2, date4, date3

Patient 2 : date2, date1, date4, date3

EDIT:

So, one more thing. What if one date is missing... thus:

dd <- data.frame(
  patient = 1:2,
  date1 = as.Date(c("01/01/2008","01/01/2002"),format="%d/%m/%Y"),
  date2 = as.Date(c("01/01/2009","01/01/2001"),format="%d/%m/%Y"),
  date3 = as.Date(c("01/01/2011","01/01/2006"),format="%d/%m/%Y"),
  date4 = as.Date(c("01/01/2010","01/01/2004"),format="%d/%m/%Y")
)

dd[2,2]<- NA

Matthews answer gives:

> t(apply(dd, 1, function(x) c(x[1], names(x[-1])[order(x[-1])])))
     patient                                
[1,] "1"     "date1" "date2" "date4" "date3"
[2,] "2"     "date2" "date4" "date3" "date1"

So the column name of the missing data point is included in the sorted list of dates at the end.But id like it to be not there... thus:

   patient                                
[1,] "1"     "date1" "date2" "date4" "date3"
[2,] "2"     "date2" "date4" "date3"

Why are you using POSIXct when you don't have a time component? Avoid POSIXct if you don't need H:M:S, else you are likely to run into issues with daylight saving and timezones. — Joshua Ulrich
– Joshua Ulrich, Commented Jan 3, 2013 at 0:46

Matthew Lundberg · Accepted Answer · 2013-01-04 04:19:00Z

Here's an application of apply to iterate through the data frame:

t(apply(dd, 1, function(x) c(x[length(x)], names(x)[order(x[-length(x)])])))

##      patient                                
## [1,] "1"     "date1" "date2" "date4" "date3"
## [2,] "2"     "date2" "date1" "date4" "date3"

It might make more sense if patient were the first column, rather than the last.

Using @thelatemail's definition instead of yours:

t(apply(dd, 1, function(x) c(x[1], names(x[-1])[order(x[-1])])))

##      patient                                
## [1,] "1"     "date1" "date2" "date4" "date3"
## [2,] "2"     "date2" "date1" "date4" "date3"

For the edited question, you cannot represent it in a data frame or a matrix as-is unless you use NA for the missing value, which would be reasonable. But instead, here is how you would get a list as a return value, as a list can have variable-length entries:

apply(dd, 1, function(x) c(x[1], names(x[-1][!is.na(x[-1])])[order(x[-1][!is.na(x[-1])])]))

## [[1]]
## patient                                 
##     "1" "date1" "date2" "date4" "date3" 
##
## [[2]]
## patient                         
##     "2" "date2" "date4" "date3"

thelatemail · Accepted Answer · 2013-01-03 05:53:36Z

1

Another attempt using by:

dd <- data.frame(
  patient = 1:2,
  date1 = as.Date(c("01/01/2008","01/01/2002"),format="%d/%m/%Y"),
  date2 = as.Date(c("01/01/2009","01/01/2001"),format="%d/%m/%Y"),
  date3 = as.Date(c("01/01/2011","01/01/2006"),format="%d/%m/%Y"),
  date4 = as.Date(c("01/01/2010","01/01/2004"),format="%d/%m/%Y")
)

by(dd,dd$patient,function(x) names(x[,order(x)]))

Resulting in:

dd$patient: 1
[1] "patient" "date1"   "date2"   "date4"   "date3"  
------------------------------------------------------------ 
dd$patient: 2
[1] "patient" "date2"   "date1"   "date4"   "date3"

To edit it to get rid of the first "patient" column, this will work:

by(dd,dd$patient,function(x) c(x[,1],names(x[,order(x[,2:ncol(x)])])))

Resulting in:

dd$patient: 1
[1] "1"     "date1" "date2" "date4" "date3"
------------------------------------------------------------------------------ 
dd$patient: 2
[1] "2"     "date2" "date1" "date4" "date3"

edited Jan 3, 2013 at 5:53

answered Jan 3, 2013 at 1:04

thelatemail

94.3k12 gold badges139 silver badges197 bronze badges

5 Comments

Luc Over a year ago

Great! That works well...but, is there then also a way to ONLY get the rows that I am interested in? I basically need a table of patient number followed by a series of columnnames (not dates), and i am not interested in the actual dates. I can obv sort the result in excel, but i was wondering if there would be an R-way

thelatemail Over a year ago

@Luc - what rows are you interested in? There is nothing in your question referencing the selection of particular rows. I am not sure what else you are requesting here.

Luc Over a year ago

instead of: dd$patient: 1 [1] "patient" "date1" "date2" "date4" "date3" ------------------------------------------------------------ dd$patient: 2 [1] "patient" "date2" "date1" "date4" "date3" to have only: patient1 "date1" "date2" "date4" "date3" patient2 "date2" "date1" "date4" "date3"

Luc Over a year ago

grrr... formatting in comments is terrible... ill fill in an 'answer'

Luc Over a year ago

ah, the answer of Matthew gives the exact formatting that i need. all good. Thank you very much for your help!

Collectives™ on Stack Overflow

Sorting names() based on dates

2 Answers 2

Comments

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Related