THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand. See also my articles on SQLperformance.com

Cardinality Estimation Bug with Lookups

Estimated row counts on key or RID lookups where a filtering predicate is applied can be wrong in SSMS execution plans.  This error does not affect the optimizer’s ultimate plan selection, but it does look odd.  There are other cases where estimated row counts are inconsistent (for defensible reasons) but the behaviour shown in this post in certainly a bug.

SQL Server 2005

The following AdventureWorks sample query displays TransactionHistory information for ProductID #1 and the last three months of 2003 (if you are using a more recent version of AdventureWorks, you will need to change the year from 2003 to 2007):

SELECT
    th.ProductID,
    th.TransactionID,
    th.TransactionDate
FROM Production.TransactionHistory AS th 
WHERE 
    th.ProductID = 1 
    AND th.TransactionDate BETWEEN '20030901' AND '20031231';

The query plan is:

Plain Plan

The execution flow is pretty straightforward.  The plan seeks a non-clustered index on the ProductID column to find rows where ProductID = 1.  This non-clustered index is keyed on ProductID alone, but the index also includes the TransactionID clustering key (to point to the parent row in the base table).  The index does not cover the query (the TransactionDate column is not present in the index) so a Key Lookup is required for the TransactionDate column.  The nested loops join drives a look up process such that each row from the Index Seek on ProductID = 1 results in a lookup in the clustered index to find the TransactionDate value for that row.  A final Filter operator passes only rows that meet the date range condition.

The next diagram shows the same plan with cardinality estimates and extra details for the Key Lookup:

SQL Server 2005 plan

The cardinality estimate at the Index Seek is exactly correct.  The table is not very large, there are up-to-date statistics associated with the index, so this is as expected.  If you run a query to find all rows in the TransactionHistory table for ProductID #1, 45 rows will indeed be returned.

The estimate for the Key Lookup is exactly correct.  Each lookup into the Clustered Index to find the Transaction Date is guaranteed to return exactly one row (each non-clustered index entry is associated with exactly one base table row).  The Key Lookup is expected to be executed 45 times (shown circled).

The estimate for the Inner Join is exactly correct – 45 rows from the seek joining to one row each time from the lookup, gives a total of 45 rows.

The Filter estimate is also good: SQL Server estimates that of the 45 rows entering the filter, 16.9951 rows will match the specified range of transaction dates.  In reality, only 11 rows are produced by this query, but that small difference in estimates is quite normal and certainly nothing to worry about here.  You might like to keep that estimate of 16.9951 rows in mind, however.

SQL Server 2008 onward

The same query executed against an identical copy of AdventureWorks on SQL Server 2008 (or R2, or 2012) produces a quite different execution plan:

SQL Server 2008 plan

Instead of an explicit Filter to find rows with the requested date range, the optimizer has pushed the date predicate down to the Key Lookup (the Key Lookup now has a predicate on Transaction Date).  This is a good optimization in general terms; it makes sense to filter rows as early as possible.  Unfortunately, it has made a bit of a mess of the cardinality estimates in the process.  The post-Filter estimate of 16.9951 rows seen in the 2005 plan has been moved to the Key Lookup.  Instead of estimating one row per lookup, the plan now suggests that 16.9951 rows will be produced by each clustered index lookup – clearly not right!

To my way of thinking, the execution plan cardinality estimates should look something like this:

image

As shown, I personally prefer to see the inner side of a nested loops join estimate the number of rows per execution, but I understand I may be in a minority on this point.  If the estimate were aggregated over the expected number of executions, the inner-side estimate would be also be 16.9951 (45 executions * 0.37767 per execution).

Cause

The query tree produced by the 2008+ optimizer looks much the same as the 2005 version – the explicit Filter is still present.  However, a post-optimization rewrite occurs in the ‘copy out’ phase that removes the Filter and incorporates it in the Key Lookup seek, resulting in a residual predicate on that operator.  This rewrite applies to regular scans and seeks too (so all residual predicates on scans and seeks are a result of this late rewrite).

I would like to thank Dima Piliugin (twitter | blog) for introducing me to undocumented trace flag 9130, which disables the rewrite from Filter + (Scan or Seek) to (Scan or Seek) + Residual Predicate.  Enabling this flag retains the Filter in the final execution plan, resulting in a SQL Server 2008+ plan that mirrors the 2005 version, with correct estimates.

The bug is that this rewrite that does not correctly update cardinality estimates when the filter is pushed down to a lookup.  I should stress that the rewrite occurs after all cost-based decisions have been made, so the incorrect estimate just looks odd and makes plan analysis harder than it ought to be.  Cardinality estimates with regular scans and seeks appear to work correctly, as far as I can tell.  The bug applies to both RID lookups and Key Lookups where a residual predicate is applied.

Workarounds

Using the trace flag is not a workaround because (a) it is (very) undocumented and unsupported; and (b) it results in a less efficient plan where rows are filtered much later than is optimal.  One genuine workaround is to provide a covering non-clustered index (avoiding the lookup avoids the problem):

CREATE INDEX nc1 
ON Production.TransactionHistory (ProductID) 
INCLUDE (TransactionDate);

With the Transaction Date filter applied as a residual predicate in the same operator as the seek, the final estimate is again as expected:

image

We could also force the use of the ultimate covering index (the clustered one) with a suitable table index hint:

SELECT
    th.ProductID,
    th.TransactionID,
    th.TransactionDate
FROM Production.TransactionHistory AS th WITH (INDEX(1))
WHERE 
    th.ProductID = 1 
    AND th.TransactionDate BETWEEN '20030901' AND '20031231';

image

Again, the estimate is 16.9951 as expected.

Fixed in SQL Sentry Plan Explorer build 7.2.42.0

After this post was published on October 15 2012 I was contacted by the SQL Sentry people to see if a good workaround could be incorporated in their free Plan Explorer product.  I was happy to provide a little testing and general feedback during a process that ultimately resulted in a new build being released on 31 October 2012.  If only SSMS limitations could be addressed so quickly!  Once you upgrade to the new version, the plan displayed for our test query is:

image

This is exactly the representation we would expect if the SQL Server bug did not exist (note that Plan Explorer aggregates estimated inner-side executions for you, and rounds to integer).  Well done to the SQL Sentry team, especially Brooke Philpott (@MacroMullet).

Summary

Providing a covering non-clustered index to avoid lookups for all possible queries is not always practical, and scanning the clustered index will rarely be the optimal choice either.  Nevertheless, these are the best workarounds we have today in SSMS.

Watch out for poor cardinality estimates when a predicate is applied as part of a lookup.

This particular cardinality estimation issue does not affect the final plan choice (the internal estimates are correct) but it does look odd and will confuse people when analysing query plans in SSMS.  If you think this situation should be improved, please vote for this Connect item.  It will be interesting to see how long it takes to catch up with Plan Explorer.

© 2012 Paul White – All Rights Reserved

twitter: @SQL_Kiwi
email: SQLkiwi@gmail.com

Published Monday, October 15, 2012 8:25 PM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alejandro Mesa said:

Good catch, Paul!

You explained it very well and I wonder if you filed a connect item for us to vote?

I will ask Adam if he could add the "Like" thinging to this web site :))

--

AMB

October 15, 2012 9:26 AM
 

Paul White said:

Hi Alejandro,

Thank you.  The connect item is at the link at the end of the post.  In case you can't see it in your reader, the link is:

https://connect.microsoft.com/SQLServer/feedback/details/767395/cardinality-estimation-error-with-pushed-predicate-on-a-lookup

October 15, 2012 9:36 AM
 

Alejandro Mesa said:

Thanks, Paul.

I already voted 1 up.

--

AMB

October 15, 2012 4:59 PM
 

SQL and the like said:

As we are all aware, there are a number of traceflags.  Some documented, some semi-documented and

December 6, 2012 4:30 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement