1

I have a table which looks like

col1     col2      col3
 x         y       0.1
 y         x       0.1
 y         z       0.2
 z         y       0.2

.......

(x,y,0.1) is equivalent to (y,x,0.1) therefore one of them has to be removed.

Basically the table is like a matrix. I need to get rid of all the entries which are above/below the diagonal of the matrix. The table has 100mil entries => the result will have 50mil entries.

3 Answers 3

3

Well, if you know that both entries are there, you can do:

delete from t
    where col1 > col2;

If some of them might already be missing and you want to keep the other one:

delete from t
   where col1 > col2 and
         exists (select 1
                 from (select 1
                       from t t2
                       where t2.y = t.x and t2.x = t.y
                      )
                )

The "double" select is a hack to get around the limitation in MySQL that you cannot directly reference the modified table in subqueries used in delete.

EDIT:

As Ypercube points out, the join clause is perhaps better:

delete t
    from t join
         t t2
         on t2.y = t.x and t2.x = t.y and
            t.y > t.x;

I actually find the in easier to understand.

Sign up to request clarification or add additional context in comments.

2 Comments

You can use a join in the from clause of the delete. Much better than the double nesting hack.
col1 and col2 are foreign keys and not necessarily every entry has a "duplicate"..
1

Try multiple-table DELETE.

The syntax is not easy. Something like that (assuming your table is named tbl):

DELETE tbl FROM tbl, tbl AS t2
    WHERE tbl.col1 = t2.col2 
        AND tbl.col2 = t2.col1 
        AND tbl.col3 = t2.col3
        AND tbl.col1 > tbl.col2

2 Comments

It'll delete all rows as far I understand..!!
@user2407394 I forgot the tbl.col1 > tbl.col2 part. As far I have tested it, it removes one line for each pair having col1,col2 swapped. Leave orphan lines untouched.
1

The solution from Sylvain should work. Here is an alternative using SubQ.

delete from mytable where (col1,col2)in(sel col2,col1 from mytable where col1>col2);

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.