I have a set of approximately 1.1 million unique IDs and I need to determine which do not have a corresponding record in my application's database. The set of IDs comes from a database as well, but not the same one. I am using PHP and MySQL and have plenty of memory - PHP is running on a server with 15GB RAM and MySQL runs on its own server which has 7.5GB RAM.
Normally I'd simply load all the IDs in one query and then use them with the IN clause of a SELECT query to do the comparison in one shot.
So far my attempts have resulted in scripts that either take an unbearably long time or that spike the CPU to 100%.
What's the best way to load such a large data set and do this comparison?