5

I am migrating data from an old training tracking system and I am using MySQL to format the data for the new training tracking system.

I have one table with 5.5 million rows which is the master set of data from the old system. I also have a table with around 49,000 rows that had been migrated previously.

What I am trying to do is compare the two tables and remove from the master data file the records that have been previously migrated (I don't want to create duplicate records in the new system.)

For the comparison I need to compare 3 fields (employee_id, course_code, and completion_date.) I am using the following SQL statement, but it just sits and spins. I don't know if it is working and just taking a long time or if it is really not working.

DELETE master_data.*
FROM master_data
INNER JOIN alreadyMigrated
ON master_data.employee_id = alreadyMigrated.employee_id 
AND master_data.course_code = alreadyMigrated.course_code
AND master_data.completion_date = alreadyMigrated.completion_date;

I also don't know if indexes would help. Any help would be appreciated. Thanks.

1 Answer 1

3

you just need to specify the table name, not the columns.

DELETE master_data
       FROM master_data
       INNER JOIN alreadyMigrated
          ON master_data.employee_id = alreadyMigrated.employee_id AND 
             master_data.course_code = alreadyMigrated.course_code AND 
             master_data.completion_date = alreadyMigrated.completion_date;
Sign up to request clarification or add additional context in comments.

5 Comments

So you do not need a WHERE clause - I did not know that - thanks. Also why the DELETE master_data rather than DELETE FROM
@AdrianCornish not really but, WHERE clause is needed for any other conditions. you need to because you are joining the table, since you do not have alias for the table.
So a bit like DELETE FROM tablename deletes all record because they all match - the join predicate is used to limit the scope
So with the master data table having 5.5 million records and the alreadyMigrated table having 49,000 how long should this take? I understand that the server and server load factor into this time. I'm just looking for a rough estimate. Hours, days, weeks...
with proper indexes in place and mysql config set up for this amount of data, it shouldn't take more time than few minutes. Yes, it depends on hardware and overall server load. you should create covering indexes for this (employee_id, course_code, completion_date) - it might take a while.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.