In a previous blog post, I had discussed a method of shredding XML to a table with HIERARCHYID, and realized that it had a dependency that I was not too keen about: The XML data required an “id” attribute in order to create the hierarchy. I had sorted out a way to inject a unique attribute ID into all the nodes (I’ll discuss this in a follow up post), but having to modify the original XML didn’t have much appeal. But, upon reading another post by my fellow blogger, Adam Machanic, I realized it could be done without this requirement. Using the technique that Adam presented, I can generate unique paths to be parsed into a HIERARCHYID column.
SET @x = '<a someAttribute="1"><b><c>abc</c><c anotherAttribute="2">def</c></b><b><c>abc</c><c>def</c></b></a>'
DECLARE @T TABLE (NodeName VARCHAR(255), Attributes XML, NodeText VARCHAR(MAX), HierarchyNode HIERARCHYID)
;WITH N (Node, NodeName, Attributes, NodeText, HierarchyPath)
AS
( SELECT
CAST(Expr.query('.') AS XML) -- Node
, CAST(Expr.value('local-name(.)', 'varchar(255)') AS VARCHAR(255)) -- NodeName
, CASE WHEN Expr.value('count(./@*)', 'INT') > 0
THEN Expr.query('<a>{for $a in ./@* return $a}</a>')
ELSE NULL END -- Attributes
, CAST(Expr.value('./text()[1]', 'varchar(max)') AS VARCHAR(MAX)) -- NodeText
, CAST('/' AS VARCHAR(1000)) -- HierarchyPath
FROM @x.nodes('/*[1]') AS Res(Expr)
UNION ALL
SELECT
Expr.query('.') -- Node
, CAST(Expr.value('local-name(.)', 'varchar(255)') AS VARCHAR(255)) -- NodeName
, CASE WHEN Expr.value('count(./@*)', 'INT') > 0
THEN Expr.query('<a>{for $a in ./@* return $a}</a>')
ELSE NULL END -- Attributes
, CAST(Expr.value('./text()[1]', 'varchar(max)') AS VARCHAR(MAX)) -- NodeText
, CAST(N.HierarchyPath
+ CAST(DENSE_RANK() OVER (ORDER BY Expr) AS VARCHAR(1000))
+ '/' AS VARCHAR(1000)) -- HierarchyPath
FROM N CROSS APPLY Node.nodes('*/*') AS Res(Expr)
)
INSERT INTO @T (NodeName, Attributes, NodeText, HierarchyNode)
SELECT NodeName, Attributes, NodeText, CAST(HierarchyPath AS HIERARCHYID)
FROM N
ORDER BY CAST(HierarchyPath AS HIERARCHYID)
SELECT * FROM @T
For this example, I simple grab the node name, the node text, and the attributes (when they exist) as a simple XML value of the format:
<a [attribute1=”attribute value” [attribute2=”attribute value”]...] />
Of course, these values could also be shredded into the hierarchy. One way of doing this would be to add an additional column to the results that represents the type of entry in the hierarchy (node versus attribute). My challenge to you is to create that solution.
Have fun!