Hi folks!
I had few performance issue in my past projects using Entity Framework Core to retrieve and summarize data. Sometimes my EF queries were not converted into SQL in the best way, and it causes performance issues. I would like to share with you some example of task and will be happy to hear your thoughts about it.
Task requirements
There is a simple SQL Server database that contains 3 tables:
The query should return all orders (including OrderType name and count of items where ItemType == 1) if the order contains any item where ItemType = 1 and with Order.CreatedAt > 12/05/2024.
Option 1: EF Query
Here is the simple EF query for it:
var test = await _context.Orders
.Include(t => t.OrderType)
.Where(t => t.CreatedAt > new DateTime(2024, 12, 05)
&& t.Items.Any(i => i.ItemType == 1))
.Select(t => new
{
t.Id,
t.Address,
t.CreatedAt,
CategoryName = t.OrderType.Name,
ItemCount = t.Items.Count(i => i.ItemType == 1),
})
.OrderBy(t => t.CreatedAt)
.Take(1000)
.ToListAsync(cancellationToken);
Pros:
- Easy to refactor or extend
- Fully integrated with EF and C#
- Strongly typed and expressive
Cons:
- Translates into a large SQL query under the hood
- May become inefficient for large datasets
- Harder to optimize with SQL indexes or query plans
And this query demonstrates performance issue (5.7s). When I look at the generated SQL query, I understood that using OrderItems is not an optimal solution. For some reason, this table is used twice in this query:
SELECT [o2].[Id], [o2].[Address], [o2].[CreatedAt], [o1].[Name] AS [CategoryName], (
SELECT COUNT(*)
FROM [OrderItems] AS [o3]
WHERE [o2].[Id] = [o3].[OrderId] AND [o3].[ItemType] = 1) AS [ItemCount]
FROM (
SELECT TOP(@__p_0) [o].[Id], [o].[Address], [o].[CreatedAt], [o].[TypeId]
FROM [Orders] AS [o]
WHERE [o].[CreatedAt] > '2024-12-05T00:00:00.0000000' AND EXISTS (
SELECT 1
FROM [OrderItems] AS [o0]
WHERE [o].[Id] = [o0].[OrderId] AND [o0].[ItemType] = 1)
ORDER BY [o].[CreatedAt]
) AS [o2]
INNER JOIN [OrderTypes] AS [o1] ON [o2].[TypeId] = [o1].[Id]
ORDER BY [o2].[CreatedAt]
Here is an execution plan for it:
Note: I did not create indexes to see the difference between the different approaches.
Option 2: SQL View + EF
I was not able to change this query for better performance and that's why I created SQL view that does the aggregation directly in the database (as I wanted):
CREATE VIEW dbo.vw_Orders_ItemsCount
AS
SELECT
t.id as OrderId,
t.address,
c.Name AS OrderType,
t.CreatedAt,
i.ItemType,
COUNT(i.id) AS ItemsCount
FROM dbo.Orders AS t
JOIN dbo.OrderTypes AS c ON c.Id = t.TypeId
JOIN dbo.OrderItems AS i ON i.OrderId = t.Id
GROUP BY
t.id,
t.address,
c.Name,
t.CreatedAt,
i.ItemType;
GO
And here is simple EF query for it:
var test = await _context.OrderItemTypeView
.Where(v => v.CreatedAt > new DateTime(2024, 12, 05) && v.ItemType == 1)
.OrderBy(v => v.CreatedAt)
.Take(1000)
.ToListAsync(cancellationToken);
It provides 2x better execution time - 3.1s. (I had not created any non-clustered indexes)
Pros:
- Simpler LINQ in C#
- Easier to optimize with indexes, execution plans, etc.
Cons:
- Less flexibility (changing logic requires DB migration)
- View becomes an additional layer to maintain
What Do You Think?
Is there a better way to write the EF query (without SQL view) above to make it cleaner or more performant?
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.