1

I have a table over which I have built an ETL service. Goods records (arrival / departure) go to the table. I have done that my table will be erased. When the item identifier arrives in the database for the second time, both records are deleted.

label   cost   time
x2       29    14/5/2020 01:00:00
x3       20    14/5/2020 01:02:00
x2       29    15/5/2020 03:12:02

Now ETL service remove records (every 30s):

label   cost   time
x3       20    14/5/2020 01:02:00

I delete it using the function:

with todelete as (
      select *, count(*) over (partition by label) as cnt, ROW_NUMBER() over (partition by label order by time DESC) as r_number
      from Table1
     )
delete from todelete
    where cnt >= 2 

I just need to make a small change right now. When, for example, during those 30 seconds I get the second record in the table, but also the third that the goods have arrived again.

        label   cost   time
        x2       29    14/5/2020 01:00:00
        x3       20    14/5/2020 01:02:00
        x2       29    15/5/2020 03:12:02
        x2       29    15/5/2020 03:12:22

In this case, I would lose the last record of the arrival of the goods. So I need to adjust the delete function. I've tried this, but it doesn't work properly.

with todelete as (
      select *
              , count(*) over (partition by label) as cnt
              , ROW_NUMBER() over (partition by label order by time DESC) as r_number
      from Table1
     )
delete from todelete
    where (cnt = 2 OR cnt = 4) OR (cnt = 3 AND r_number > 1);

Would anyone help me to modify that function please.

UPDATE: Exactly table: Step1:

label   cost   time
x2       29    14/5/2020 01:00:00
x3       20    14/5/2020 01:02:00
x2       29    15/5/2020 03:12:02

Step2: Delete row with function:

with todelete as (
      select *, count(*) over (partition by label) as cnt, ROW_NUMBER() over (partition by label order by time DESC) as r_number
      from Table1
     )
delete from todelete
    where cnt >= 2 

My goal is described above, but one more thing that affects is when it comes to moving with a change in price like this:

  label   cost   time
    x2       29    14/5/2020 01:00:00
    x3       20    14/5/2020 01:02:00
    x2       30    15/5/2020 03:12:22

And then ETL deleted the old record and left the one with the new price.

        label   cost   time
        x3       20    14/5/2020 01:02:00
        x2       30    15/5/2020 03:12:22

This fact complicates it even more.

3
  • I didn't get the point. During the time ETL process tries to delete the records with cnt >= 2 will delete all rows for a label where it has more than one record. How does it make difference by adding (cnt = 2 OR cnt = 4) OR (cnt = 3 AND r_number > 1) ? Commented Sep 10, 2020 at 6:40
  • 1
    @Sujitmohanty30 Because if the records for departure and arrival arrive during those 30 seconds, all records will be deleted after going through the ETL. And I need the third record to be preserved. Commented Sep 10, 2020 at 7:18
  • Ahh You are right. Completely make sense and thanks for your nice clarifications. Commented Sep 10, 2020 at 8:33

1 Answer 1

2

why don't you try something like this.

Explanation:

Notice that I changed from DESC to ASC in the row_number() function's order by clause.

with this I delete the first arriving rows till the rows are even. if Cnt=3 then (cnt/2)*2=2 as cnt is integer(note integer/2 gives integer not a fraction)

when cnt=4 then (cnt/2)*2 gives back 4.

with todelete as (
      select *, count(*) over (partition by label) as cnt, ROW_NUMBER() over (partition by label order by time ASC) as r_number
      from Table1
     )
delete from todelete 
    where cnt > 1 and r_number between 1 and (cnt/2)*2
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.