I have run a query on our SQL Server 2012 which returned no results. I discovered that this was incorrect and I SHOULD have gotten 16 records. I changed the query and get the answer expected but I am at a loss to understand why my original query did not work as expected.
So my ORIGINAL query which returned no results was:
SELECT
WPB.[ID number]
FROM
[Fact].[REPORT].[WPB_LIST_OF_IDS] WPB
WHERE
[ID number] NOT IN (SELECT DISTINCT IdNumber
FROM MasterData.Dimension.Customer DC)
The reworked query is this:
SELECT
WPB.[ID number]
FROM
[Fact].[REPORT].[WPB_LIST_OF_IDS] WPB
LEFT JOIN
MasterData.Dimension.Customer DC ON WPB.[ID number] = DC.IdNumber
WHERE
DC.IdNumber IS NULL
Can anyone tell me WHY the first query (which incidentally runs in fractions of a second vs the 2nd which takes a minute) does not work? I don't want to repeat this mistake in the future!
IDfields aren't indexed. In any case, if you want help with SQL you should provide table schemas, indexes, sample data and desired output. If you want help with performance you should first check the execution planSelect distinct IdNumbercauses an unnecessaryDISTINCToperation. You don't care how many1are returned, you only care whether there are any or none. The query optimizer will either ignore thatdistinct* or end up performing a useless sort/distinct operation. If *any*IdNumber` entry in a Dimension table is NULL you have a very serious problem. Dimensions shouldn't have nulls, they should have explicit records forMissing,NotApplicable,NotFoundvalues. Again without schema and data people can only guess