Jonathan Kehayias and Paul Randall posted more than a year ago great articles on how to monitor historical deadlocks using Extended Events system_health default trace. Both tried to fix on the fly bug in xml output that caused failures in xml validation. Today I've found out that their version isn't bulletproof either. So here is the fixed one:
SELECT CAST(xest.target_data as XML) xml_data, *
INNER JOIN sys.dm_xe_sessions xes on xes.[address] = xest.event_session_address
xest.target_name = 'ring_buffer' AND
xes.name = 'system_health'
;WITH CTE( event_name, event_time, deadlock_graph )
event_xml.value('(./@name)', 'varchar(1000)') as event_name,
event_xml.value('(./@timestamp)', 'datetime') as event_time,
event_xml.value('(./data[@name="xml_report"]/value)', 'varchar(max)') as deadlock_graph
CROSS APPLY xml_data.nodes('//event[@name="xml_deadlock_report"]') n (event_xml)
WHERE event_xml.value('@name', 'varchar(4000)') = 'xml_deadlock_report'
SELECT event_name, event_time,
WHEN CHARINDEX('<victim-list/>', deadlock_graph) > 0 THEN
REPLACE(deadlock_graph, '<victim-list/>', '<deadlock><victim-list>'),
REPLACE(deadlock_graph, '<victim-list>', '<deadlock><victim-list>'),
AS XML) AS DeadlockGraph
ORDER BY event_time DESC
The difference as you can see is in the check whether
'<victim-list>' node is empty (
CHARINDEX('<victim-list/>', deadlock_graph) > 0). My system_health session caught some weird deadlocks. According to trace they're caused by intra-query parallelism when threads of the same session deadlock one another. Until now it is rather a rare type of deadlock but nothing strange - Bart Duncan and Craig Freedman blogged about it long ago. What is really strange is that this deadlock isn't caught by 1222 trace flag - error log remained empty. And - probably connected to previous item - victim list is empty. So instead of opening node
'<victim-list>' we receive an empty one:
'<victim-list/>'. I'm not sure such a deadlock deserves any attention if it occurs once in a while but I sure want to be able to parse xml_deadlock_report events without failures.