Skip to main content
added 80 characters in body
Source Link
AAA
  • 13
  • 2
  • 6

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

date split is fine. need to split the val2 value also as per the split date.

if the date diff is 2 then we would split 2 rows that should be

  • row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

  • row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

if the date diff is 2 then we would split 2 rows that should be

  • row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

  • row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

date split is fine. need to split the val2 value also as per the split date.

if the date diff is 2 then we would split 2 rows that should be

  • row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

  • row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

added 391 characters in body
Source Link
AAA
  • 13
  • 2
  • 6

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

if the date diff is 2 then we would split 2 rows that should be

row 1:

  • row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

row 2:

  • row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1val1

val2val2 = ratio*val2

How can I script this?

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

if the date diff is 2 then we would split 2 rows that should be

row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

if the date diff is 2 then we would split 2 rows that should be

  • row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

  • row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

added 391 characters in body
Source Link
AAA
  • 13
  • 2
  • 6

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.  How

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

if the date diff is 2 then we would split 2 rows that should be

row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.  How can I script this?

I have this input:

      startdate             end date         val1    val2
2015-10-13 07:00:02 2015-10-19 00:00:00      45      1900

in which one line specifies a date range that spans multiple days, and I want to split the range into separate time periods, each one being a subset of a day (each one on a separate line), to facilitate parallel processing of the (multi-day) range.

The output should be

2015-10-13 07:00:02 2015-10-13 23:59:59      45      1900
2015-10-14 00:00:01 2015-10-14 23:59:59      45      1900
2015-10-15 00:00:01 2015-10-15 23:59:59      45      1900
2015-10-16 00:00:01 2015-10-16 23:59:59      45      1900
2015-10-17 00:00:01 2015-10-17 23:59:59      45      1900
2015-10-18 00:00:01 2015-10-18 23:59:59      45      1900
2015-10-19 00:00:01 2015-10-19 00:00:00      45      1900

where the data after the end time (val1 and val2) are replicated on each line.

  1. Actually the input records are coming from the hive table and the output records also will store it in split table.

Modifications:

if the date diff is 2 then we would split 2 rows that should be

row 1:

ratio= ratio of timespent 1st day (i.e. end-start on day 1) / val1

val2= ratio*val2

row 2:

ratio= ratio of timespent 1st day (i.e. end-start on day 2) / val1

val2 = ratio*val2

How can I script this?

Attempted to clarify.
Source Link
Loading
Post Reopened by G-Man Says 'Reinstate Monica', Archemar, don_crissti, Stephen Kitt, countermode
Post Closed as "Needs details or clarity" by n.st, HalosGhost, Jeff Schaller, techraf, Satō Katsura
added 62 characters in body
Source Link
Wildcard
  • 37.5k
  • 30
  • 149
  • 284
Loading
added 40 characters in body
Source Link
n.st
  • 8.4k
  • 4
  • 38
  • 57
Loading
Source Link
AAA
  • 13
  • 2
  • 6
Loading