26

I have a tree struture in a single table. The table is a tree of categories that can be nested endlessly. Each category has a ProductCount column that tells how many products are directly in the category (not summing child categories).

Id  | ParentId | Name      | ProductCount
------------------------------------
1   | -1       | Cars      | 0
2   | -1       | Bikes     | 1
3   | 1        | Ford      | 10
4   | 3        | Mustang   | 7
5   | 3        | Focus     | 4

I would like to make a sql query that for each row/category gives me the number of products including the ones in the child categories.

The output for the table above should be

Id  | ParentId | Name      | ProductCount | ProductCountIncludingChildren
--------------------------------------------------------------------------
1   | -1       | Cars      | 0            | 21
2   | -1       | Bikes     | 1            | 1
3   | 1        | Ford      | 10           | 21
4   | 3        | Mustang   | 7            | 7
5   | 3        | Focus     | 4            | 4

I know I probably should use CTE, but cant quite get it working the way it should.

Any help is appreciated!

2
  • What you have tried so far? Post your query... Commented Jun 24, 2014 at 19:25
  • Tried CTE, but could not get it to sum correct Commented Jun 24, 2014 at 19:27

5 Answers 5

30

You can use a recursive CTE where you in the anchor part get all rows and in the recursive part join to get the child rows. Remember the original Id aliased RootID from the anchor part and do sum aggregate in the main query grouped by RootID.

SQL Fiddle

MS SQL Server 2012 Schema Setup:

create table T
(
  Id int primary key,
  ParentId int,
  Name varchar(10),
  ProductCount int
);

insert into T values
(1, -1, 'Cars',    0),
(2, -1, 'Bikes',   1),
(3,  1, 'Ford',    10),
(4,  3, 'Mustang', 7),
(5,  3, 'Focus',   4);

create index IX_T_ParentID on T(ParentID) include(ProductCount, Id);

Query 1:

with C as
(
  select T.Id,
         T.ProductCount,
         T.Id as RootID
  from T
  union all
  select T.Id,
         T.ProductCount,
         C.RootID
  from T
    inner join C 
      on T.ParentId = C.Id
)
select T.Id,
       T.ParentId,
       T.Name,
       T.ProductCount,
       S.ProductCountIncludingChildren
from T
  inner join (
             select RootID,
                    sum(ProductCount) as ProductCountIncludingChildren
             from C
             group by RootID
             ) as S
    on T.Id = S.RootID
order by T.Id
option (maxrecursion 0)

Results:

| ID | PARENTID |    NAME | PRODUCTCOUNT | PRODUCTCOUNTINCLUDINGCHILDREN |
|----|----------|---------|--------------|-------------------------------|
|  1 |       -1 |    Cars |            0 |                            21 |
|  2 |       -1 |   Bikes |            1 |                             1 |
|  3 |        1 |    Ford |           10 |                            21 |
|  4 |        3 | Mustang |            7 |                             7 |
|  5 |        3 |   Focus |            4 |                             4 |
Sign up to request clarification or add additional context in comments.

2 Comments

This recursive CTE has very bad scaling, because it essentially copies the leaf value to all parents, immediate and further up the tree (e.g. copies ProductCount from Mustang to each of Ford and Cars). I tried it on a data set of about 200 and the CTE result set ballooned to about 100k rows, and it took about half a minute.
@Elaskanator thanks for trying I want to do something similar for around 3 million set. Just getting goosebumps thinking about my CTE result set.
8

This is the same concept as Tom's answer, but less code (and way faster).

with cte as
(
  select v.Id, v.ParentId, v.Name, v.ProductCount, 
  cast('/' + cast(v.Id as varchar) + '/' as varchar) Node
  from Vehicle v
  where ParentId = -1
  union all
  select v.Id, v.ParentId, v.Name, v.ProductCount,  
  cast(c.Node + CAST(v.Id as varchar) + '/' as varchar)
  from Vehicle v
  join cte c on v.ParentId = c.Id
)

select c1.Id, c1.ParentId, c1.Name, c1.ProductCount, 
c1.ProductCount + SUM(isnull(c2.ProductCount, 0)) ProductCountIncludingChildren
from cte c1
left outer join cte c2 on c1.Node <> c2.Node and left(c2.Node, LEN(c1.Node)) = c1.Node
group by c1.Id, c1.ParentId, c1.Name, c1.ProductCount
order by c1.Id

SQL Fiddle (I added some extra data rows for testing)

4 Comments

When casting to varchar without specifying the string length you will get the default of 30 characters. It could be enough but I think it is better to be explicit about what string length you actually want to use.
That's true. I don't know what his actual data looks like, so I didn't concern myself with details like that.
Well, he did say that "The table is a tree of categories that can be nested endlessly." Which of course is not literally true but it could make the tree pretty deep.
I'll admit this isn't an ideal solution. Your answer is the best one so far.
1

Actually this could be a good use of HIERARCHYID in SQL Server..

CREATE TABLE [dbo].[CategoryTree]
(
    [Id] INT,
    [ParentId] INT,
    [Name] VARCHAR(100),
    [ProductCount] INT
)
GO

INSERT [dbo].[CategoryTree]
VALUES
    (1, -1, 'Cars', 0),
    (2, -1, 'Bikes', 1),
    (3, 1, 'Ford', 10),
    (4, 3, 'Mustang', 7),
    (5, 3, 'Focus', 4)
    --,(6, 1, 'BMW', 100)
GO

Query

WITH [cteRN] AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY [ParentId] ORDER BY [ParentId]) AS [ROW_NUMBER]
    FROM  [dbo].[CategoryTree]
),
[cteHierarchy] AS (
    SELECT CAST(
            CAST(hierarchyid::GetRoot() AS VARCHAR(100))
            + CAST([ROW_NUMBER] AS VARCHAR(100))
            + '/' AS HIERARCHYID
        ) AS [Node],
        *
    FROM [cteRN]
    WHERE [ParentId] = -1
    UNION ALL
    SELECT CAST(
            hierarchy.Node.ToString()
            + CAST(RN.[ROW_NUMBER] AS VARCHAR(100)
        ) + '/' AS HIERARCHYID),
        rn.*
    FROM [cteRN] rn
    INNER JOIN [cteHierarchy] hierarchy
        ON rn.[ParentId] = hierarchy.[Id]
)
SELECT x.[Node].ToString() AS [Node],
    x.[Id], x.[ParentId], x.[Name], x.[ProductCount],
    x.[ProductCount] + SUM(ISNULL(child.[ProductCount],0))
        AS [ProductCountIncludingChildren]
FROM [cteHierarchy] x
LEFT JOIN [cteHierarchy] child
    ON child.[Node].IsDescendantOf(x.[Node]) = 1
    AND child.[Node] <> x.[Node]
GROUP BY x.[Node], x.[Id], x.[ParentId], x.[Name], x.[ProductCount]
ORDER BY x.[Id]

Result

Results screenshot

2 Comments

Note that most of the query is just about setting up the HierarchyId "Node" column. If you could store the data with the HierarchyId column then final query should be pretty quick..
For the actual issue in this post the solution above works just as well and is alot less complicated, but using HierarchyId allows you to sum per level which is alot better imo.
0

This wont be optimal but it works, however it involves 2 CTEs. 1 main CTE and a CTE in a table valued function to sum up the values for each sub tree.

The first CTE

;WITH cte 
AS 
(
SELECT 
   anchor.Id,
   anchor.ParentId,
   anchor.Name,
   anchor.ProductCount,
   s.Total AS ProductCountIncludingChildren
FROM
testTable anchor 
    CROSS APPLY SumChild(anchor.id) s
WHERE anchor.parentid = -1
UNION ALL
SELECT 
   child.Id,
   child.ParentId,
   child.Name,
   child.ProductCount,
   s.Total AS ProductCountIncludingChildren
  FROM
cte 
  INNER JOIN testTable child on child.parentid = cte.id
  CROSS APPLY SumChild(child.id) s
 )
 SELECT * from cte 

AND the function

CREATE FUNCTION SumChild 
(
@id int

)
RETURNS TABLE
AS
 RETURN  
 (
 WITH cte 
 AS 
 (
   SELECT 
     anchor.Id,
     anchor.ParentId,
     anchor.ProductCount
   FROM
      testTable anchor 
   WHERE anchor.id = @id 
   UNION ALL
SELECT 
      child.Id,
      child.ParentId,
      child.ProductCount
    FROM
   cte 
     INNER JOIN testTable child on child.parentid = cte.id
)
SELECT SUM(ProductCount) AS Total from CTE
 )
GO

Which results in:

Results in SSMS

from the source table

Source table

Apologies about formatting.

Comments

-1

I couldn't come up with a good T-SQL, set based answer, but I did come up with an answer: The temp table mimics your table structure. The table variable is a work table.

--Initial table
CREATE TABLE #products (Id INT, ParentId INT, NAME VARCHAR(255), ProductCount INT)
INSERT INTO #products
        ( ID,ParentId, NAME, ProductCount )
VALUES  ( 1,-1,'Cars',0),(2,-1,'Bikes',1),(3,1,'Ford',10),(4,3,'Mustang',7),(5,3,'Focus',4)

--Work table
DECLARE @products TABLE (ID INT, ParentId INT, NAME VARCHAR(255), ProductCount INT, ProductCountIncludingChildren INT)
INSERT INTO @products
        ( ID ,
          ParentId ,
          NAME ,
          ProductCount ,
          ProductCountIncludingChildren
        )
SELECT  Id ,
        ParentId ,
        NAME ,
        ProductCount,
        0
FROM #products

DECLARE @i INT
SELECT @i = MAX(id) FROM @products

--Stupid loop - loops suck
WHILE @i > 0
    BEGIN
        WITH cte AS (SELECT ParentId, SUM(ProductCountIncludingChildren) AS ProductCountIncludingChildren FROM @products GROUP BY ParentId)
        UPDATE p1
        SET p1.ProductCountIncludingChildren = p1.ProductCount + isnull(p2.ProductCountIncludingChildren,0)
        FROM @products p1
        LEFT OUTER JOIN cte p2 ON p1.ID = p2.ParentId
        WHERE p1.ID = @i

        SELECT @i = @i - 1
    END

SELECT *
FROM @products

DROP TABLE #products

I'd be very interested to see a better, set based approach. The problem I ran into is that when you use recursive cte's, you start with the parent and work toward the children - this doesn't really work for getting a sum at the parent levels. You'd have to do some kind of backward recursive cte.

3 Comments

You can start at the bottom of a tree and work up in a recursive CTE by using something like SELECT leafNodes.* FROM [dbo].[CategoryTree] leafNodes LEFT JOIN [dbo].[CategoryTree] children ON children.[ParentId] = leafNodes.[Id] WHERE children.[Id] IS NULL as the anchor
The problem is that you can't use GROUP BY and aggregation in the recursive member of a CTE. The only thing I could think of was a recursive CTE in a scalar function which is essentially the same as using a loop.
I think I had the same idea as you, but used a tabled value function( which is unnecessary see above - I also noted it wasn't optimal). Id also thought about walking from the bottom up, summing as I went but couldnt work out how to do that quickly.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.