0

Im trying to sort columns for individual patients based on dates in those columns in R. I made an example data set, however, the data set does not return dates, but long numbers (no idea why). Forgive my perhaps silly way of creating the data frame :)...

dd<- 
data.frame(rbind(
c(as.POSIXct(as.Date("01/01/2008", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2009", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2011", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2010", format="%d/%m/%Y")))
,
c(as.POSIXct(as.Date("01/01/2002", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2001", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2006", format="%d/%m/%Y")),
as.POSIXct(as.Date("01/01/2004", format="%d/%m/%Y")))
))
dd$patient[1] <- 1
dd$patient[2] <- 2
names(dd) <- c("date1", "date2", "date3", "date4", "patient")

What I am after is a list of colum names per patient, sorted by dates within those columns. Thus,

Patient 1 : date1, date2, date4, date3

Patient 2 : date2, date1, date4, date3

EDIT:

So, one more thing. What if one date is missing... thus:

dd <- data.frame(
  patient = 1:2,
  date1 = as.Date(c("01/01/2008","01/01/2002"),format="%d/%m/%Y"),
  date2 = as.Date(c("01/01/2009","01/01/2001"),format="%d/%m/%Y"),
  date3 = as.Date(c("01/01/2011","01/01/2006"),format="%d/%m/%Y"),
  date4 = as.Date(c("01/01/2010","01/01/2004"),format="%d/%m/%Y")
)

dd[2,2]<- NA

Matthews answer gives:

> t(apply(dd, 1, function(x) c(x[1], names(x[-1])[order(x[-1])])))
     patient                                
[1,] "1"     "date1" "date2" "date4" "date3"
[2,] "2"     "date2" "date4" "date3" "date1"

So the column name of the missing data point is included in the sorted list of dates at the end.But id like it to be not there... thus:

   patient                                
[1,] "1"     "date1" "date2" "date4" "date3"
[2,] "2"     "date2" "date4" "date3"
1
  • Why are you using POSIXct when you don't have a time component? Avoid POSIXct if you don't need H:M:S, else you are likely to run into issues with daylight saving and timezones. Commented Jan 3, 2013 at 0:46

2 Answers 2

2

Here's an application of apply to iterate through the data frame:

t(apply(dd, 1, function(x) c(x[length(x)], names(x)[order(x[-length(x)])])))

##      patient                                
## [1,] "1"     "date1" "date2" "date4" "date3"
## [2,] "2"     "date2" "date1" "date4" "date3"

It might make more sense if patient were the first column, rather than the last.

Using @thelatemail's definition instead of yours:

t(apply(dd, 1, function(x) c(x[1], names(x[-1])[order(x[-1])])))

##      patient                                
## [1,] "1"     "date1" "date2" "date4" "date3"
## [2,] "2"     "date2" "date1" "date4" "date3"

For the edited question, you cannot represent it in a data frame or a matrix as-is unless you use NA for the missing value, which would be reasonable. But instead, here is how you would get a list as a return value, as a list can have variable-length entries:

apply(dd, 1, function(x) c(x[1], names(x[-1][!is.na(x[-1])])[order(x[-1][!is.na(x[-1])])]))

## [[1]]
## patient                                 
##     "1" "date1" "date2" "date4" "date3" 
##
## [[2]]
## patient                         
##     "2" "date2" "date4" "date3" 
Sign up to request clarification or add additional context in comments.

Comments

1

Another attempt using by:

dd <- data.frame(
  patient = 1:2,
  date1 = as.Date(c("01/01/2008","01/01/2002"),format="%d/%m/%Y"),
  date2 = as.Date(c("01/01/2009","01/01/2001"),format="%d/%m/%Y"),
  date3 = as.Date(c("01/01/2011","01/01/2006"),format="%d/%m/%Y"),
  date4 = as.Date(c("01/01/2010","01/01/2004"),format="%d/%m/%Y")
)

by(dd,dd$patient,function(x) names(x[,order(x)]))

Resulting in:

dd$patient: 1
[1] "patient" "date1"   "date2"   "date4"   "date3"  
------------------------------------------------------------ 
dd$patient: 2
[1] "patient" "date2"   "date1"   "date4"   "date3"  

To edit it to get rid of the first "patient" column, this will work:

by(dd,dd$patient,function(x) c(x[,1],names(x[,order(x[,2:ncol(x)])])))

Resulting in:

dd$patient: 1
[1] "1"     "date1" "date2" "date4" "date3"
------------------------------------------------------------------------------ 
dd$patient: 2
[1] "2"     "date2" "date1" "date4" "date3"

5 Comments

Great! That works well...but, is there then also a way to ONLY get the rows that I am interested in? I basically need a table of patient number followed by a series of columnnames (not dates), and i am not interested in the actual dates. I can obv sort the result in excel, but i was wondering if there would be an R-way
@Luc - what rows are you interested in? There is nothing in your question referencing the selection of particular rows. I am not sure what else you are requesting here.
instead of: dd$patient: 1 [1] "patient" "date1" "date2" "date4" "date3" ------------------------------------------------------------ dd$patient: 2 [1] "patient" "date2" "date1" "date4" "date3" to have only: patient1 "date1" "date2" "date4" "date3" patient2 "date2" "date1" "date4" "date3"
grrr... formatting in comments is terrible... ill fill in an 'answer'
ah, the answer of Matthew gives the exact formatting that i need. all good. Thank you very much for your help!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.