0

Imagine this table:

| 1 | 2 | 1 | 2 | 1 |
| 1 | 2 | 3 | 2 | 1 |
| 1 | 2 | 3 | 4 | 1 |

How would I go about removing rows with duplicate values, excluding the first and last column...I could very well be thinking of this in an awkward manner. e.g.:

First row : 1 2 1 2 1 - it has two 2s, and 3 1s. I want to remove this because : it has two 2s, and the middle 1 can be found in the beginning, or end column of the row.

Second row : 1 2 3 2 1 - it has two 2s. I want to remove this because : it has two 2s.

Third row : Is fine. the duplicate values at first and last column do not matter, and the values in between are different !

I can imagine a few awkward ways to do this, but since SQL is not my strongest quality, I'd like to hear the pros' opinions :)

3
  • would you be able to share what you have tried? Commented Jan 29, 2014 at 16:47
  • I don't understand completely your criteria. The second row does have duplicated values on col2 and col4 Commented Jan 29, 2014 at 16:51
  • Thanks Lamak - fixed it. I should've spotted my mistake, but I've been looking at the thing all day... MCP, so far nothing :( I thought I'd try stackoverflow first. Commented Jan 29, 2014 at 16:55

3 Answers 3

2
DELETE
FROM MyTable
WHERE
  CASE [Col1] WHEN [Col5] THEN 4 ELSE 5 END
  > (SELECT COUNT(DISTINCT v) FROM ( VALUES ([Col1]),([Col2]),([Col3]),([Col4],([Col5]) ) t(v) )
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for this - I neglected to say that the number of columns in the table is variable. The table itself is a product of a dynamic query. So the in between columns could be 2, or 3, or 30. Can you explain the last t(v) part?
The VALUES clause creates an ad-hoc table from a list of values. The t(v) gives that ad-hoc table the alias t and gives its only column the alias v. This lets me reference them.
You should post the query that produces your output. I suspect what you want to do will be much easier if you do it at an earlier stage in the logic (i.e. before a dynamic pivot)
@EvangelosAktoudianakis I don't get how this answer meets your needs. It doesn't work with an unknown number of columns, and it also doesn't check that the values of the columns must be different than the first or last column
0

This might look cumbersome, but it does the job:

SELECT *
FROM YourTable
WHERE Col2 NOT IN (Col1,Col3,Col4,Col5)
AND Col3 NOT IN (Col1,Col2,Col4,Col5)
AND Col4 NOT IN (Col1,Col2,Col3,Col5)

Here is a sqlfiddle for you to try.

3 Comments

Thank you - problem with this approach is, my table is generated by a random query. The number of columns differ..I either need a static query, or one that can be recursively created with minimal trouble :(
@EvangelosAktoudianakis you should've added that to your question instead of making us give you an answer that isn't suited for your needs
My apologies, I am new to overflow and definitely not great in trying to explain a problem that's been bothering me for some time. My thanks for your time anyway.
0

Another option:

DELETE FROM atable
WHERE EXISTS (
  SELECT 1
  FROM (
    SELECT col1
    UNION
    SELECT col5
    UNION ALL
    SELECT col2
    UNION ALL
    SELECT col3
    UNION ALL
    SELECT col4
  ) AS s (col)
  HAVING COUNT(*) > COUNT(DISTINCT col)
);

For every row, the five columns are combined as rows into a virtual dataset, the col1 and col5 columns being combined with UNION, to eliminate duplication between them, and the other columns being added with UNION ALL. Then the row count in the resulting set is compared with the number of unique values. If those results differ, the given row is to be deleted.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.