2

I have the following query:

WITH RankedCallNotes AS (
    SELECT
        Id,
        CreatedOn,
        CandidateId,
        Text,
        ROW_NUMBER() OVER (PARTITION BY CandidateId ORDER BY CreatedOn DESC) AS rn
    FROM
        [dbo].[CandidateCallNotes]
)

SELECT 
    c.Id,
    c.FirstName,
    c.LastName,
    c.AvailableDate,
    c.Email,
    cm0.Modality AS PrimaryModality,
    cm1.Modality AS Modality2,
    c.Phone,
    s.State AS HomeState,
    STRING_AGG(cert.CertName, ', ') AS Certificates,
    c.ActiveStateLicenseId,
    r.Name AS Rep,
    n.CreatedOn AS MostRecentCallTsp,
    n.Text AS MostRecentCall
FROM [dbo].[Candidates] c
    LEFT JOIN 
        (
            SELECT cm.CandidateId,
                cm.ModalityId,
                m.Name AS Modality,
                cm."Order" 
            FROM [dbo].[CandidateModalities] cm
            JOIN [dbo].[Modalities] m ON m.Id = cm.ModalityId
            WHERE "Order" = 0
        ) AS cm0 ON c.Id = cm0.CandidateId
    LEFT JOIN 
        (
            SELECT cm.CandidateId,
                cm.ModalityId,
                m.Name AS Modality,
                cm."Order" 
            FROM [dbo].[CandidateModalities] cm
            JOIN [dbo].[Modalities] m ON m.Id = cm.ModalityId
            WHERE "Order" = 1
        ) AS cm1 ON c.Id = cm1.CandidateId
    LEFT JOIN
        [dbo].[States] s ON c.HomeStateId = s.Id
    LEFT JOIN
        [dbo].[CandidateModalities] cm ON cm.CandidateId = c.Id
    LEFT JOIN
        (
            SELECT CertName,
                CandidateModalityId 
            FROM [dbo].[CandidateCertifications]
            WHERE CertName IS NOT NULL
            AND CertName <> ''
        ) cert ON cm.Id = cert.CandidateModalityId
    LEFT JOIN
        [dbo].[Reps] r ON c.RepId = r.Id
    LEFT JOIN
        RankedCallNotes n ON n.CandidateId = c.Id
        AND n.rn = 1
GROUP BY
    c.Id,
    c.FirstName,
    c.LastName,
    c.AvailableDate,
    c.Email,
    cm0.Modality,
    cm1.Modality,
    c.Phone,
    s."State",
    c.ActiveStateLicenseId,
    r.Name,
    n.CreatedOn,
    n.Text

It takes around 40 seconds to run so I'm trying to figure a way to optimize it if I can. For context, the Candidates table is around 40,000 rows, the CandidateModalities table is around 40,000 rows, and the CandidateCallNotes table is around 200,000 rows.

Here's the data plan: https://www.brentozar.com/pastetheplan/?id=0ctqpalAIr

What I've noticed:

  1. There's a Filter operation that the plan says is taking 36983 ms to complete which seems really problematic.
  2. There's a Sort operation that has a warning attached reading Operator used tempdb to spill data during execution with spill level {0} and {1} spilled thread(s) and its actual elapsed time is 36907 ms.
  3. If I remove portions requiring aggregate functions (either for Certificates or for CandidateCallNotes), the execution time improves to <1 second so I assume my issues primarily lie there.

I'm extremely new to query optimization and am not sure exactly what I should be looking for within this plan, how to interpret that, and then what to do to address the issue. I assume indexes would help but I have Azure's automatic indexing in place and want to make sure I don't screw up what it's already doing.

2
  • 2
    Do the CertName-CSV in a subquery, this way, you won't need any group BYs or heavy joins Commented Sep 9 at 18:47
  • If I remove portions requiring aggregate functions (either for Certificates or for CandidateCallNotes) I don't see any aggregates for CandidateCallNotes Commented Sep 9 at 20:18

2 Answers 2

3

Lots to chew on here. Primarily, your indexing is awful, but you can also improve the query:

  • Don't group on the outside, and instead push aggregation into subqueries.
    Because when you group on the outside and you are not groupig by primary keys then it's going to force the server to do a separate aggregation step for the whole resultset.

  • Combine the two subqueries on CandidateModalities and its left join into one single subquery, pre-grouped as I suggested.

  • CertName IS NOT NULL is superfluous.

WITH RankedCallNotes AS (
    SELECT
        ccn.Id,
        ccn.CreatedOn,
        ccn.CandidateId,
        ccn.Text,
        ROW_NUMBER() OVER (PARTITION BY ccn.CandidateId ORDER BY ccn.CreatedOn DESC) AS rn
    FROM
        dbo.CandidateCallNotes ccn
)

SELECT 
    c.Id,
    c.FirstName,
    c.LastName,
    c.AvailableDate,
    c.Email,
    cm.PrimaryModality,
    cm.Modality2,
    c.Phone,
    s.State AS HomeState,
    cm.Certificates,
    c.ActiveStateLicenseId,
    r.Name AS Rep,
    n.CreatedOn AS MostRecentCallTsp,
    n.Text AS MostRecentCall
FROM dbo.Candidates AS c
LEFT JOIN (
    SELECT
        cm.CandidateId,
        MIN(CASE WHEN cm."Order" = 0 THEN cm.Name END) AS PrimaryModality,
        MIN(CASE WHEN cm."Order" = 1 THEN cm.Name END) AS Modality2,
        STRING_AGG(cert.CertName, ', ') AS Certificates
    FROM dbo.CandidateModalities cm
    JOIN dbo.Modalities m ON m.Id = cm.ModalityId
    -- what is the relationship here? Maybe worth adding another aggregation level
    LEFT JOIN dbo.CandidateCertifications cert
        ON cm.Id = cert.CandidateModalityId
        AND cert.CertName <> ''
    GROUP BY
        cm.CandidateId
) AS cm ON c.Id = cm.CandidateId
LEFT JOIN
    dbo.States s ON c.HomeStateId = s.Id
LEFT JOIN
    dbo.Reps r ON c.RepId = r.Id
LEFT JOIN
    RankedCallNotes n ON n.CandidateId = c.Id
    AND n.rn = 1;

Then you need better indexing.

  • CandidateModalities needs indexing over the two FK columns, although I'm not sure which column order is going to work out better.
CandidateModalities (CandidateId, ModalityId) INCLUDE ("Order", Name)  -- could swap the key order as well
  • Most of CandidateCertifications has a blank CertName, you should therefore use a filtered index.
CandidateCertifications (CandidateModalityId) INCLUDE (CertName) WHERE (CertName <> N'')
  • CandidateCallNotes should be indexed by the CreatedOn column
CandidateCallNotes (CandidateId, CreatedOn DESC) INCLUDE (Text)
  • Finally, it may be worth trying an OUTER APPLY for the RankedCallNotes
OUTER APPLY (
    SELECT TOP (1)
        ccn.*
    FROM
        CandidateCallNotes ccn
    WHERE
        ccn.CandidateId = c.Id
    ORDER BY
        ccn.CreatedOn DESC
) AS n
Sign up to request clarification or add additional context in comments.

Comments

2

There's lots of sorts in the plan, and the time taken is mostly for IO sorts and spilling to Tempdb. With enough memory all the sorts will be in-memory, so scaling up to get more memory should help.

Or look at the tables and consider making the clustered indexes have matching leading columns so the merge joins don't require sorts. Or ensuring that there are indexes that would support nested loop joins instead of the merge joins. You can hint loop joins to test, or use the FAST N hint to try to get a streaming plan.

<WaitStats>
   <Wait WaitType="IO_COMPLETION" WaitTimeMs="18294" WaitCount="974" />
   <Wait WaitType="IO_QUEUE_LIMIT" WaitTimeMs="16533" WaitCount="1071" />
   <Wait WaitType="RESOURCE_GOVERNOR_IDLE" WaitTimeMs="3177" WaitCount="312" />
   <Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="2953" WaitCount="291" />
   <Wait WaitType="RESERVED_MEMORY_ALLOCATION_EXT" WaitTimeMs="34" WaitCount="44759" />
   <Wait WaitType="ASYNC_NETWORK_IO" WaitTimeMs="32" WaitCount="3" />
   <Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="5" WaitCount="1715" />
</WaitStats>
<QueryTimeStats ElapsedTime="40616" CpuTime="2648" />

IO_COMPLETION
Occurs while waiting for I/O operations to complete. This wait type generally represents non-data page I/Os. Data page I/O completion waits appear as PAGEIOLATCH_* waits.

IO_QUEUE_LIMIT
Occurs when the asynchronous IO queue for the Azure SQL Database has too many IOs pending. Tasks trying to issue another IO are blocked on this wait type until the number of pending IOs drop below the threshold. The threshold is proportional to the DTUs assigned to the database.

https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-wait-stats-azure-sql-database?view=azuresqldb-current

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.