4

In a table T with 20 million rows and a unique constraint on integer column uk, the following query induces SQL Server 2019 to perform a scan of all index entries instead of a single seek:

select max(uk) 
from T 
where uk <= @x

@x has a value that is close to the maximum for column uk, but that should be beside the point. The engine should always do a single seek in a situation like that, period.

Curiously enough, in SSMS the plan for this query looks exactly like the plan on the development server where the query works as intended, in every detail. The only difference is the actual number of rows read, which is 1 on the development server and 20 million on staging.

Note: the table sizes are similar but their query histories are wildly divergent. And, yes, statistics were updated during the bug hunt, but to no avail.

For now I've reformulated the query like this in order for production to proceed:

select top 1 uk 
from T 
where uk <= @x 
order by 1 desc

That results in the expected single seek, and the execution plan is only marginally less efficient than the plan with MAX() would normally be.

Even though the problem is fixed for this particular query, our application contains many more subsidiary queries just like it and I don't relish the idea that any one of them could suddenly start misbehaving like this one.

What could possibly induce such an egregious mis-fire of the SQL engine? How can we guard against it?

Here is the requested table structure:

create table hlg.Anfragen
(
    tx_utc          datetime2(4) not null, -- vom Server vereindeutigt, da gleichzeitig Primärschlüssel
    tx_id           int          not null, -- als externe Korrelations-Id (kann später evtl. entfallen)

    -- ... more fields here (avg. row size 46.2 bytes in Disk Usage per Table report)

    constraint PK_hlg_Anfragen_TxUTC
        primary key clustered (tx_utc) with (fillfactor = 100),

    constraint UK_hlg_Anfragen_TxId
        unique (tx_id)
)

UPDATE: looking at the XML execution plan shows ScanDirection="BACKWARD" as expected on the development server but ScanDirection="FORWARD" on staging, even though the predicate is 'LE' in both cases and all the other details seem to be the same as well (like Index="[UK_hlg_Anfragen_TxId]").

The XML execution plan for an invocation of the stored procedure is now at https://pastetheplan.com. I have not been able to reproduce the problem with minimal ad hoc queries, only by executing the stored procedure. The offending query is the third one from the top.

4
  • 1
    Please show the full DDL schema (including indexes) for the table, and share the execution plans for both queries via pastetheplan.com Commented Sep 25 at 13:51
  • Is the table partitioned? Commented Sep 25 at 13:51
  • 1
    It's possible we could give you better advice if you used PasteThePlan to show us the actual execution plans from both servers. Please edit yoiur question. Commented Sep 25 at 13:53
  • @ Charlieface & @ O. Jones: I have been able to post the execution plan by remoting into our office. Commented Sep 25 at 15:26

1 Answer 1

6

This does look like a parameter sniffing issue.

In your case it estimates that the WHERE tx_id <= @BisTxId will return 1 row anyway. Not the 19,821,936 it actually reads.

So therefore it believes the cost of that seek operator is so low anyway there is no cost benefit of applying the ScalarGbAggToTop transformation. This will be based on the parameter value "sniffed" when the execution plan is initially compiled.

You could add a RECOMPILE hint to get the plan compiled on each execution so it takes account of the actual runtime parameter value or add an OPTION (OPTIMIZE FOR (@BisTxId UNKNOWN)) hint, so it just uses a generic guess rather than estimates based on the first parameter values seen, but if you are changing the query text anyway probably will be best just to rewrite the query (as you have already done) so the logical specification more closely matches your desired plan shape and you are not dependent on this transformation being applied.

An example repro showing the parameter sniffing impact is at DB Fiddle

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, and I admire how you were able to sniff out the problem (pun intended :-) ) so promptly and repro it succinctly on DB Fiddle. I'd just gotten to the point where I could repro the problem on my development server by reloading the DB from a backup. Simply recreating or altering the procedure was not enough, because the table in the development DB had experienced several 100k inserts with much higher id values but lower timestamps (i.e. primary key values) than those at the 'working end' of the table, which must have been enough to keep the planner from going down the wrong path.
@ Martin Smith: I can understand where this optimisation is coming from - if things look right, go straight for the first or last index block and start a scan (expecting the result to be on the first and only page accessed) for a single LRead instead of doing an index descent, which on non-trivial tables usual costs something like 3 or 4 LReads. However, for me as a programmer, a plan with a single index descent is already a perfectly perfect plan that executes in fractions of a millisecond unless there are physical reads involved.
Mitigating the optimisation by way of OPTION (RECOMPILE) tends to cost something on the order of 10 ms, which is a pessimisation by more than an order of magnitude. If we didn't have TOP 1 queries with a suitable ORDER BY clause as a workaround, I would regard this optimisation as majorly iffy ...
It does consider the transformation rule for the case in the Fiddle But it is choosing between a plan with (according to its estimates) Forward Seek of 1 row -> Stream aggregate and one with Backward Seek of 1 row -> Top -> Stream aggregate And the seeks and aggregates are costed identically. The Top operator introduces a miniscule extra operator cost of 0.0000001 - which is enough to make that alternative seem more expensive. Despite in reality that plan being much more resilient in the face of incorrect estimates.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.