I have this input:
startdate end date val1 val2
2015-10-13 07:00:02 2015-10-19 00:00:00 45 1900
in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.
The output should be
2015-10-13 07:00:02 2015-10-13 23:59:59 45 1900
2015-10-14 00:00:01 2015-10-14 23:59:59 45 1900
2015-10-15 00:00:01 2015-10-15 23:59:59 45 1900
2015-10-16 00:00:01 2015-10-16 23:59:59 45 1900
2015-10-17 00:00:01 2015-10-17 23:59:59 45 1900
2015-10-18 00:00:01 2015-10-18 23:59:59 45 1900
2015-10-19 00:00:01 2015-10-19 00:00:00 45 1900
where the data after the end time (val1 and val2) are replicated on each line.
- Actually the input records are coming from the hive table and the output records also will store it in split table.
Modifications:
date split is fine. need to split the val2 value also as per the split date.
if the date diff is 2 then we would split 2 rows that should be
- row 1:
ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1
val2= ratio*val2
- row 2:
ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1
val2 = ratio*val2
How can I script this?