TSQL - How do I remove rows that have duplicate columns?

Question

Imagine this table:

| 1 | 2 | 1 | 2 | 1 |
| 1 | 2 | 3 | 2 | 1 |
| 1 | 2 | 3 | 4 | 1 |

How would I go about removing rows with duplicate values, excluding the first and last column...I could very well be thinking of this in an awkward manner. e.g.:

First row : 1 2 1 2 1 - it has two 2s, and 3 1s. I want to remove this because : it has two 2s, and the middle 1 can be found in the beginning, or end column of the row.

Second row : 1 2 3 2 1 - it has two 2s. I want to remove this because : it has two 2s.

Third row : Is fine. the duplicate values at first and last column do not matter, and the values in between are different !

I can imagine a few awkward ways to do this, but since SQL is not my strongest quality, I'd like to hear the pros' opinions :)

I don't understand completely your criteria. The second row does have duplicated values on col2 and col4 — Lamak
– Lamak, Commented Jan 29, 2014 at 16:51
Thanks Lamak - fixed it. I should've spotted my mistake, but I've been looking at the thing all day... MCP, so far nothing :( I thought I'd try stackoverflow first. — Evangelos Aktoudianakis
– Evangelos Aktoudianakis, Commented Jan 29, 2014 at 16:55

Anon · Accepted Answer · 2014-01-29 18:47:14Z

2

DELETE
FROM MyTable
WHERE
  CASE [Col1] WHEN [Col5] THEN 4 ELSE 5 END
  > (SELECT COUNT(DISTINCT v) FROM ( VALUES ([Col1]),([Col2]),([Col3]),([Col4],([Col5]) ) t(v) )

edited Jan 29, 2014 at 18:47

answered Jan 29, 2014 at 16:58

Anon

10.9k1 gold badge32 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Evangelos Aktoudianakis Over a year ago

Thank you for this - I neglected to say that the number of columns in the table is variable. The table itself is a product of a dynamic query. So the in between columns could be 2, or 3, or 30. Can you explain the last t(v) part?

Anon Over a year ago

The VALUES clause creates an ad-hoc table from a list of values. The t(v) gives that ad-hoc table the alias t and gives its only column the alias v. This lets me reference them.

Anon Over a year ago

You should post the query that produces your output. I suspect what you want to do will be much easier if you do it at an earlier stage in the logic (i.e. before a dynamic pivot)

Lamak Over a year ago

@EvangelosAktoudianakis I don't get how this answer meets your needs. It doesn't work with an unknown number of columns, and it also doesn't check that the values of the columns must be different than the first or last column

Lamak · Accepted Answer · 2014-01-29 17:01:38Z

0

This might look cumbersome, but it does the job:

SELECT *
FROM YourTable
WHERE Col2 NOT IN (Col1,Col3,Col4,Col5)
AND Col3 NOT IN (Col1,Col2,Col4,Col5)
AND Col4 NOT IN (Col1,Col2,Col3,Col5)

Here is a sqlfiddle for you to try.

answered Jan 29, 2014 at 17:01

Lamak

70.8k12 gold badges118 silver badges119 bronze badges

3 Comments

Evangelos Aktoudianakis Over a year ago

Thank you - problem with this approach is, my table is generated by a random query. The number of columns differ..I either need a static query, or one that can be recursively created with minimal trouble :(

Lamak Over a year ago

@EvangelosAktoudianakis you should've added that to your question instead of making us give you an answer that isn't suited for your needs

Evangelos Aktoudianakis Over a year ago

My apologies, I am new to overflow and definitely not great in trying to explain a problem that's been bothering me for some time. My thanks for your time anyway.

Andriy M · Accepted Answer · 2014-01-29 19:04:41Z

Another option:

DELETE FROM atable
WHERE EXISTS (
  SELECT 1
  FROM (
    SELECT col1
    UNION
    SELECT col5
    UNION ALL
    SELECT col2
    UNION ALL
    SELECT col3
    UNION ALL
    SELECT col4
  ) AS s (col)
  HAVING COUNT(*) > COUNT(DISTINCT col)
);

For every row, the five columns are combined as rows into a virtual dataset, the col1 and col5 columns being combined with UNION, to eliminate duplication between them, and the other columns being added with UNION ALL. Then the row count in the resulting set is compared with the number of unique values. If those results differ, the given row is to be deleted.

Collectives™ on Stack Overflow

TSQL - How do I remove rows that have duplicate columns?

3 Answers 3

4 Comments

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

Comments

Related