THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Swinging From Tree to Tree Using CTEs, Part 1: Adjacency to Nested Sets

I'm not sure how many times over the last several years I've seen the same tired article titles... "Climbing Trees in SQL," "Climbing Up the SQL Tree," or maybe, "Naked Coeds Playing in the Trees!" ... Oh wait, I think that last one might be something else.

But anyway, the point is, I'm going to adhere to that standard.  But I'm adding a Tarzan-esque theme to this post, because we're not going to just climb a tree.  We're going to swing about between trees.  Different types of tree representations are appropriate for different scenarios.  Which is why, as I pointed out in my recent SearchSQLServer article on recursive Common Table Expressions, we have so many different ways of representing them.  Adjacency Lists, Materialized Paths, Nested Sets, and Nested Intervals spring to mind.  And there are probably others.

My article shows how to use the CTEs, in conjunction with a dynamically generated materialized path, to manipulate an Adjacency List, getting many of the benefits associated with using the Nested Sets Model.  And that's great.  But the Nested Sets Model itself might be useful in your endeavors.  So in this post, I will show how to extend the CTE discussed in that article.

The end-result CTE in the article, which can be run in the AdventureWorks database, looks something like this (renamed for the sake of this post):

WITH EmployeeLevels AS
(
    SELECT
        EmployeeId,
        CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,
        1 AS Level
    FROM HumanResources.Employee
    WHERE ManagerId IS NULL

    UNION ALL

    SELECT
        e.EmployeeId,
        x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,
        x.Level + 1 AS Level
    FROM EmployeeLevels x
    JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId
)
thePath is the materialized path, a '.' delimited breadcrumb from the root to each node.  We also end up with a level, which represents how many nodes away from the root in the hierarchy each employee sits (very important for those upwardly-mobile junior execs, no doubt!)

I'm going to assume that readers of this post are familiar with the Nested Sets Model, as popularized by Joe Celko.  If not, read the link.

After staring at the Nested Sets for a while, I arrived at the following mathematical properties:

  • Value of Lft for the root node is 1
  • Value of Rgt for the root node is 2 * (Number of nodes)
  • Value of Lft for any node is ((Number of nodes visited) * 2) - (Level of current node)
  • Value of Rgt for any node is (Lft value) + ((Number of subnodes) * 2) + 1
I think the only factor here that requires further explanation is "(Number of nodes visited)".  By this, I mean the number of nodes that would have been visited (including the current node) if one were doing a preorder traversal of the tree. Luckily, this number is quite easy to determine.  The row number for each row, as determined by ordering by the materialized path, is this number.  I encourage readers to validate this with pencil and paper if it doesn't quite make sense reading on the screen.  Draw a simple tree and traverse, starting from the lefthand side of the root node.

But how to translate that into T-SQL?  Luckily, SQL Server 2005, in addition to CTEs, also includes the very useful ROW_NUMBER() function.  So to get the row number, we simply need to add another CTE into the chain (did you know that successive CTEs can use the results of previous CTEs?):

WITH EmployeeLevels AS
(
    SELECT
        EmployeeId,
        CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,
        1 AS Level
    FROM HumanResources.Employee
    WHERE ManagerId IS NULL

    UNION ALL

    SELECT
        e.EmployeeId,
        x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,
        x.Level + 1 AS Level
    FROM EmployeeLevels x
    JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId
),
EmployeeRows AS
(
    SELECT
         EmployeeLevels.*,
         ROW_NUMBER() OVER (ORDER BY thePath) AS Row
    FROM EmployeeLevels
)
We now have current level and number of nodes visited, which gives us the Lft value for each node.  But how to determine the number of subnodes, in order to get the Rgt value?  Luckily, the materialized path also gives us that capability...

For any given node, number of subnodes can be determined by counting all nodes whose path value is LIKE the current path value + '.%'.

The resultant query should be fairly obvious at this point:

WITH EmployeeLevels AS
(
    SELECT
        EmployeeId,
        CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,
        1 AS Level
    FROM HumanResources.Employee
    WHERE ManagerId IS NULL

    UNION ALL

    SELECT
        e.EmployeeId,
        x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,
        x.Level + 1 AS Level
    FROM EmployeeLevels x
    JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId
),
EmployeeRows AS
(
    SELECT
         EmployeeLevels.*,
         ROW_NUMBER() OVER (ORDER BY thePath) AS Row
    FROM EmployeeLevels
)
SELECT
     ER.EmployeeId,
     ER.thePath,
     ER.Level,
     ER.Row,
     (ER.Row * 2) - ER.Level AS Lft,
     ((ER.Row * 2) - ER.Level) +
        (
            SELECT COUNT(*) * 2
            FROM EmployeeRows ER2
            WHERE ER2.thePath LIKE ER.thePath + '.%'
        ) + 1 AS Rgt
FROM EmployeeRows ER
ORDER BY thePath
... And that's it!  We have now converted the Adjacency List tree into a Nested Sets tree.

In the next installment, I will show how to determine the Nested Intervals encoding from an Adjacency List tree, also using recursive CTEs.

Questions?


Published Wednesday, July 12, 2006 10:47 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Shree Periakaruppan said:

Wow..  This is the best solution that I  have seen for generating nested sets using SQL Server 2005!

Great work! and thanks a lot!

August 20, 2007 5:10 PM
 

Andy Bellenie said:

This model is excellent for converting a non-ordered adjacency model to nested sets. However how could you get it to work when there is an explicit sort order for the children underneath any particular parent? Ideally you'd be able to add an ORDER BY clause the the second SELECT statement but of course SQL doesn't allow you to do this.

I'd very much appreciate your comments on this issue.

September 3, 2008 1:20 PM
 

Adam Machanic said:

In our previous installment , we saw how to convert Adjacency Lists into Nested Sets using a CTE. In

January 25, 2009 2:59 PM
 

Ashwini said:

Simple Great! I was badly stuck with this problem, was abt to build the .Net tree...

But your solution saved my time and lots of work

Thank you so much

March 17, 2009 9:33 AM
 

John said:

Great tip!!!

Right now, the above CTE will not work if it extends beyond 32767 (its rowcount > 32767). Can you provide a way to do this.

Thanks in advance.

December 5, 2009 2:29 AM
 

John said:

Great tip!!!

Right now, the above CTE will not work if recursion extends beyond 32767 (its rowcount > 32767). Can you provide a way to do this.

Thanks in advance.

Sorry ... double post

Updated 'it' -> resursion

December 5, 2009 2:30 AM
 

Adam Machanic said:

Hi John,

Do you perchance have a cycle in your tree? There should be no reason for the CTE to recurse that many times, unless you have an INCREDIBLY deep hierarchy. If that's the case, I would love to hear about it--I've never heard of any real-world case of a hierarchy with more than a couple of hundred levels. What kind of data are you working with?

Here's code to test for cycles:

--Traverse up the hierarchy toward the

--leaf node

;WITH e AS

(

   SELECT

       e0.EmployeeId,

       e0.ManagerId

   FROM HumanResources.Employee AS e0

   WHERE

       NOT EXISTS

       (

           SELECT *

           FROM HumanResources.Employee AS e1

           WHERE

               e1.ManagerId = e0.EmployeeId

       )

   UNION ALL

   SELECT

       e.EmployeeId,

       et.ManagerId

   FROM HumanResources.Employee et

   JOIN e ON e.ManagerId = et.EmployeeId

   WHERE

       et.ManagerId IS NOT NULL

       AND e.ManagerId <> e.EmployeeId

)

SELECT *

FROM e

WHERE

   e.ManagerId = e.EmployeeId

December 5, 2009 2:22 PM
 

John said:

I was trying to create a Nested Set of book categories heirarchy from

BrowseNodes.com.

Here is the web address for reference:

http://www.browsenodes.com/node--2000.html

It uses Adjacency List model to create the heirarchy. Yes, the heirarchy is incredibly deep. Amazing category heirarchy.

Again, thanks.

December 5, 2009 10:48 PM
 

Adam Machanic said:

Hi John,

I can't figure out how to download the full categories hierarchy, but I've done this for a real book Web site before, using data for around 5 million books, provided by Muze. I can tell you that in that case the hierarchy wasn't even close to that deep--maybe 10 levels at most. Are you certain you're not seeing the result of a cycle, or some other issue?

December 6, 2009 11:58 AM
 

John said:

Hi Adam,

Thank you very much for your continuous help.

I ran your query for tree cycle test and returned 0 row.

I've implemented Celko's procedure of converting AL to NS model. And it took 31 min. Its implmented in SQL/PCM and is not using the latest technology such as CTE.  

I've read some books saying cursors are far better than CTE when doing recursion. I wonder if this is where CTE can be better.

Again, Thanks a lot.

December 6, 2009 3:41 PM
 

John said:

BTW, my test is on tree pruning, removing categories with empty books.

The less dummy books in the tree, the faster it is. When I created large test books the slower it is.

thanks

December 6, 2009 3:55 PM
 

bobweston said:

Adam,

I'm struggling with some performance issues on SQL 2005 related to the final example in your article, which by the way is the most complete I've found, and could really use some help.  We're using it to completely rebuild the LFT and RGT columns in our employee table each night as the data is completely refreshed daily.  

Our structure has 10 levels and 450K nodes and are taking up to 14 hours to complete on our development server.  The EmployeeID and ManagerID columns are indexed and otherwise perform ok, but I can't for the life of me get this to complete during our maintenance window.

Do you have any ideas?  We've just migrated to SQL 2005 and I'm still learning about CTE and could use some assistance.  We employed a different method on SQL 2000 using an ActiveX script in a DTS package and the process took less than 2 hours on a much less equipped server.

Thanks in Advance,

Bob

February 21, 2010 4:58 AM
 

Adam Machanic said:

Hi Bob,

Grab the code samples for "Expert SQL Server 2005 Development", here: http://apress.com/book/downloadfile/3498

... check out Chapter11.SQL, starting on line 1306. That method may be faster than the one in this post.

... Or maybe not. While researching the book I discovered that recursive CTEs are quite slow and found that I could get much better results using a temp table indexed on the "level" column. With only 450K nodes I think you could process the entire thing in a matter of minutes with a well-tuned solution, so I'm thinking that you might want to ditch the CTEs altogether.

By the way, why rebuild the whole thing every night instead of simply maintaining it in-place? Again, check out my book, or at least the samples therein :-) ... and another by the way, I also discovered through extensive testing that Nested Sets is much slower than an optimized materialized path solution. But it may be too late for you to reconsider the design at this stage?

Best of luck!

February 22, 2010 12:04 AM
 

bobweston said:

Thanks for the tip, Adam!  I'll give it a shot tonight and let you know how it goes :)

Bob

March 2, 2010 8:44 PM
 

bobweston said:

Adam,

I've meant to get back here and thank you for sharing that second example.  That took the processing time from over 14 hours to approximately 5 minutes for our whole package!

WHAT A DIFFERENCE!  Thanks again!

Take care,

Bob

March 11, 2010 2:19 PM
 

Paul Russo said:

Cool - very nice - We use nested sets to hold our tree data (speed of access) -  but this is hard to maintain - so convert from NS to AL - do our maintenance and then convert back to NS using the above.  Thanks again, Paul,

February 24, 2012 6:32 AM
 

Jackson said:

Great piece of code.. this helped me in huge way :) Thanks Adam :)

August 10, 2012 3:51 PM
 

miki said:

Left and Right indexes don't work correct

declare @T2 table (RowId int , Product_Id int , Parent_ID int , LogicOrder int );

insert into @T2(RowId , Product_Id , Parent_ID , LogicOrder )

select 1,0,-1,0 UNION

select 2,1,0,3 UNION

select 3,2,0,2 UNION

select 4,3,0,1 UNION

select 5,4,1,4 UNION

select 6,5,1,3 UNION

select 7,6,1,2 UNION

select 8,7,1,1 UNION

select 9,8,7,2 UNION

select 10,9,7,1 ;

LEFT    RIGHT

1 20

2 3

4 5

6 19

7 12

9 10

11 12

13 14

14 15

16 17

September 9, 2012 5:42 AM
 

Adam Machanic said:

@miki: Looks perfect on this end, ignoring your LogicOrder column:

0 0 1 20

1 0.1 2 15

4 0.1.4 3 4

5 0.1.5 5 6

6 0.1.6 7 8

7 0.1.7 9 14

8 0.1.7.8 10 11

9 0.1.7.9 12 13

2 0.2 16 17

3 0.3 18 19

... if you want to use the LogicOrder, build your path based on a ROW_NUMBER OVER (ORDER BY LogicOrder) instead:

0 1 1 20

3 1.1 2 3

2 1.2 4 5

1 1.3 6 19

7 1.3.1 7 12

9 1.3.1.1 8 9

8 1.3.1.2 10 11

6 1.3.2 13 14

5 1.3.3 15 16

4 1.3.4 17 18

September 9, 2012 11:36 AM
 

drsql said:

Excellent! I am building some demo hierarchies for a presentation, and it was taking forever to load a 50K node/9 level nested set model 1 row at a time, and using this code I can load it in seconds.

It definitely works with > 33000 nodes, as I just used it with a 55000 node tree, and next am going to try with a 500K one

September 5, 2013 11:42 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement