SQL Delete specific rows based on date and criteria [duplicate]

Question

I've got a table that has duplicate data that needs to be cleaned up. Consider the following example:

CREATE TABLE #StackOverFlow
(
    [ctrc_num] int, 
    [Ctrc_name] varchar(6),
    [docu] bit, 
    [adj] bit, 
    new bit, 
    [some_date] datetime
);
    
INSERT INTO #StackOverFlow
    ([ctrc_num], [Ctrc_name], [docu], [adj], [new], [some_date])
VALUES
    (12345, 'John R', null, null, 1, '2023-12-11 09:05:13.003'),
    (12345, 'John R', 1, null, 0, '2023-12-11 09:05:12.987'),
    (12345, 'John R', null, null, 1, '2023-12-11 09:05:12.947'),
    (56789, 'Sam S', null, null, 1, '2023-12-11 09:05:13.003'),
    (56789, 'Sam S', null, null, 1, '2023-12-11 09:05:12.987'),
    (56789, 'Sam S', 1, null, 0, '2023-12-11 09:05:12.947'),
    (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:13.003'),
    (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:12.987'),
    (78945, 'Pat P', null, null, 1, '2023-12-11 09:05:12.947');

This gives me:

[ctrc_num]  [Ctrc_name] [docu]  [adj]   [new]   [some_date]
-----------------------------------------------------------------------
12345        John R     NULL    NULL    1       2023-12-11 09:05:13.003
12345        John R     1       NULL    0       2023-12-11 09:05:12.987
12345        John R     NULL    NULL    1       2023-12-11 09:05:12.947
56789        Sam S      NULL    NULL    1       2023-12-11 09:05:13.003
56789        Sam S      NULL    NULL    1       2023-12-11 09:05:12.987
56789        Sam S      1       NULL    0       2023-12-11 09:05:12.947
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:13.003
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:12.987
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:12.947

What I need to do is delete from the table duplicates. If new is 0, delete the records where new is 1. If all records have new = 1 keep the newest record and delete the older ones.

The result should look like this:

[ctrc_num]  [Ctrc_name] [docu]  [adj]  [new]    [some_date]
-----------------------------------------------------------------------
12345        John R     1       NULL    0       2023-12-11 09:05:12.987
56789        Sam S      1       NULL    0       2023-12-11 09:05:12.947
78945        Pat P      NULL    NULL    1       2023-12-11 09:05:13.003

I've tried ROW_NUMBER:

;WITH RankedByDate AS
(
    SELECT 
        ctrc_num, Ctrc_name,
        docu, adj, new, some_date,
        ROW_NUMBER() OVER (PARTITION BY Ctrc_num, Ctrc_name, [docu],[adj], [new] 
                           ORDER BY some_date DESC) AS rNum
    FROM 
        #StackOverFlow
)
SELECT * 
FROM RankedByDate

This separates the ones with new = 0, but I still have the ones with new = 1 that are ordered.

Grouping gives me the records that are duplicated but no way to delete the ones needed to be deleted:

SELECT [ctrc_num]
    ,[Ctrc_name]
    ,[docu]
    ,[adj]
    ,[new]
FROM 
    #StackOverFlow
GROUP BY 
    [ctrc_num]
    ,[Ctrc_name]
    ,[docu]
    ,[adj]
    ,[new]
HAVING 
    COUNT(*) > 1

What constitutes a duplicate? Same [ctrc_num] and [Ctrc_name]? — PM 77-1
– PM 77-1, Commented Dec 15, 2023 at 16:09
There are no duplicate rows since no two rows are equal. Therefore you must specify what you mean by duplicate. Also, which value of the rows not building the duplicate do you want to keep? — Olivier Jacot-Descombes
– Olivier Jacot-Descombes, Commented Dec 15, 2023 at 16:18
Unless there can be more than one new = 0, your logic can be summarized as remove all rows partitioned by ctrc_num order by new, some_date desc where row_number > 1. It shouldn't be very hard to come up with sql corresponding to the above. — siggemannen
– siggemannen, Commented Dec 15, 2023 at 16:21

Thom A · Accepted Answer · 2023-12-15 17:01:25Z

2

Break the problem down into it's parts

"If new is 0, delete the records where new is 1"

delete from #StackOverFlow
where [new] = 1
and [ctrc_num] in (select [ctrc_num]
                   from #StackOverFlow
                   where [new] = 0);

"If all records have new = 1 keep the newest record and delete the older ones" Use a CTE to add a row number based on the date and partitioned by the [ctrc_num] such that the "first" record in each group is the one you want to keep - if there is only 1 row in a group that's the one you want to keep anyway. Then delete everything else
```
;with cte as
(
    select 
         [ctrc_num]  
         ,ROW_NUMBER() OVER (PARTITION BY [ctrc_num] ORDER BY [ctrc_num], [some_date] DESC) as rw
    from #StackOverFlow
)
DELETE FROM cte where rw <> 1;
```

edited Dec 15, 2023 at 17:01

Thom A♦

97.6k12 gold badges67 silver badges102 bronze badges

answered Dec 15, 2023 at 16:20

CHill60

2,0731 gold badge14 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

kool_kris Over a year ago

This is exactly what I was looking for. I was hoping I would be able to eliminate the duplicate without having to break it into more than one part, but this works.

siggemannen Over a year ago

you can write this as subquery too, no need for CTE.

CHill60 Over a year ago

@TN - Why? In step 1 I deleted any records where new = 1 if there was a subsequent new = 0. So either there is only a single record per [ctrc_num] and new = 1 OR there is/are 1+ records for a [ctrc_num] where new = 0. Sorting by new only becomes relevant if trying to do both steps at once.

CHill60 Over a year ago

@kool_kris - as you will see from siggemannen's solution, it is possible to do what you want in a single query. But when you are trying to figure out how to do something it is good practice to break it down first. See also "SQL Antipatterns" by Bill Karwin - Chapter 18 "Spaghetti Query" - "Solve a Complex Problem in One Step". You can always merge the "bits" together afterwards - once you have something working. Personally, I'd rather have three simple queries I can follow than one complex one that has me puzzled :-)

CHill60 Over a year ago

@TN - Phew. I did scratch my head for a while though - at least you made me think :-)

|

T N · Accepted Answer · 2023-12-18 16:39:37Z

2

It is possible to do what you want is a single query.

;with cte as(
    select [ctrc_num], [Ctrc_name], [docu],[adj], [new], [some_date]
    ,ROW_NUMBER() over(partition by [ctrc_num] -- group by [ctrc_num]
        order by [new], --0 then 1
        [some_date] desc --newest first
        ) rn
    from #StackOverFlow
)
delete cte
where rn>1
;

select * from #StackOverFlow

edited Dec 18, 2023 at 16:39

T N

10.6k1 gold badge12 silver badges30 bronze badges

answered Dec 15, 2023 at 16:32

Alex Kudryashev

9,5003 gold badges31 silver badges39 bronze badges

Collectives™ on Stack Overflow

SQL Delete specific rows based on date and criteria [duplicate]

2 Answers 2

6 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Linked

Related