I have a problem which I hope someone can help me with. It is basically data manipulation. I have a big dataset that consists of 10 columns, "id" and 3 sets of similar variables, "type","startdate", and "enddate". An example can be seen below.
id type1 startdate1 enddate1 type2 startdate2 enddate2 type3 startdate3
1 1 A 2006-08-20 2006-12-06 W 2006-08-01 2007-08-29 P 2007-08-18
2 2 A 2006-01-05 2007-07-02 NA NA NA Q 2008-01-15
enddate3
1 2007-09-27
2 2008-02-07
I would like to obtain the following cleaned and sorted dataset:
id type1 startdate1 enddate1 type2 startdate2 enddate2 type3 startdate3
1 1 W 2006-08-01 2007-08-29 A 2006-08-20 2006-12-06 P 2007-08-18
2 2 A 2006-01-05 2007-07-02 Q 2008-01-15 2008-02-07 NA NA
enddate3
1 2007-09-27
2 NA
I would like to sort in ascending order, every row/observation according to the "startdate". Hence for row 1, since the second group or set of variables has an earlier "startdate" (2006-08-01) as compared to the first group's "startdate" (2006-08-20), I would place it to the first position.
As for row 2, I would like to push all the NAs to the end.
Any tips on how I can do this efficiently?
Should I convert data type of "startdate" and "enddate" to numeric? If I should, how should I handle "NA"?
Is it wise to apply paste() function on the (type,startdate,enddate) for all the 3 sets?
Appreciate any help! Thank you in advance!