THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand.

Incorrect Results Caused By Adding an Index

It’s not often that a SQL Server bug really surprises me or makes me wonder how it was never spotted before, but this is one of those.

To get right into it, say you have the following two tables:

CREATE PARTITION FUNCTION PF (integer)
AS RANGE RIGHT
FOR VALUES (1000, 2000, 3000, 4000, 5000);
 
CREATE PARTITION SCHEME PS
AS PARTITION PF
ALL TO ([PRIMARY]);
 
-- Partitioned
CREATE TABLE dbo.T1
(
    T1ID    integer NOT NULL,
    SomeID  integer NOT NULL,
 
    CONSTRAINT [PK dbo.T1 T1ID]
        PRIMARY KEY CLUSTERED (T1ID)
        ON PS (T1ID)
);
 
-- Not partitioned
CREATE TABLE dbo.T2
(
    T2ID    integer IDENTITY (1,1) NOT NULL,
    T1ID    integer NOT NULL,
    
    CONSTRAINT [PK dbo.T2 T2ID]
        PRIMARY KEY CLUSTERED (T2ID)
        ON [PRIMARY]
);

Load table T1 with 4,999 rows. All of the rows have a SomeID value of 1234, and the T1ID primary key is sequentially numbered from 1 to 4,999:

INSERT dbo.T1 (T1ID, SomeID)
SELECT N.n, 1234
FROM dbo.Numbers AS N
WHERE N.n BETWEEN 1 AND 4999;

Table T2 gets 999 rows, generated by adding T1ID values from T1 that divide exactly by 5:

INSERT dbo.T2 (T1ID)
SELECT T1ID
FROM dbo.T1
WHERE T1ID % 5 = 0;

More visually, T1 looks like this (T1ID goes up to 4,999):

T1 data

And T2 looks like this (T2ID goes up to 999):

T2 data

The test query simply counts the rows that match between the two tables when joined on T1ID:

SELECT COUNT_BIG(*)
FROM dbo.T1 AS T1
JOIN dbo.T2 AS T2
    ON T2.T1ID = T1.T1ID
WHERE T1.SomeID = 1234;

The execution plan features a merge join:

Merge Join Execution Plan

The correct result (999) is returned and everyone is happy:

Correct Result

Enter the Index

Now someone comes along and adds a new index to table T1:

CREATE NONCLUSTERED INDEX [dbo.T1 SomeID]
ON dbo.T1 (SomeID DESC);

This is a perfectly reasonable index, apparently essential for some crucial query or other. Let’s run our COUNT_BIG(*) query again:

SELECT COUNT_BIG(*)
FROM dbo.T1 AS T1
JOIN dbo.T2 AS T2
    ON T2.T1ID = T1.T1ID
WHERE T1.SomeID = 1234;

The execution plan looks similar:

Execution Plan with New Index

But the result is wrong! There are still 999 matches in the underlying data.

Wrong Result

An Even Simpler Query

With the new index still in place, we run this query:

SELECT T1ID
FROM dbo.T1 
WHERE SomeID = 1234
ORDER BY T1ID ASC;

This query should obviously return all the T1IDs from 1 to 4,999 in ascending order. Instead, we get:

Wrongly Ordered Result Part 1

The list starts at 4000 not 1! Also, out-of-order rows are found further down:

Wrongly Ordered Result Part 2

The results are not ordered by T1ID despite the ORDER BY T1TD ASC clause. Quite astonishing.

Cause

Both problems are caused by a bug in the query optimizer, which is present in all versions of SQL Server from 2008 to 2014 CTP 1 inclusive. The bug produces a query plan that does not provide the ordering guarantees the optimizer thinks it does, leading to incorrect results. A partitioned table is required to reproduce the bug.

The sneaky aspect to it is that the index which causes the problem could be added at any time, without the original query-writer’s knowledge. Equally, data changes could mean that a query plan that used to use a hash or nested loops join suddenly recompiles to choose a merge join. Since a merge join requires sorted input, the opportunity for suddenly incorrect (incomplete!) results is obvious (and an example was shown above).

There is no trace flag that I am aware of that fixes this issue.

I have opened a Connect item for this bug, and written more about the detailed explanation in a guest post on SQLperformance.com

Resolution

The fix for this issue is now available and documented in a Knowledge Base article. Please note the fix requires a code update and trace flag 4199, which enables a range of other query processor changes. It is unusual for an incorrect-results bug to be fixed under 4199. I asked for clarification on that and the response was:

Even though this problem involves incorrect results like other hotfixes involving the Query Processor we have only enabled this fix under trace flag 4199 for SQL Server 2008, 2008 R2, and 2012. However, this fix is “on” by default without the trace flag in SQL Server 2014 RTM.

Paul White

Twitter : @SQL_Kiwi

© 2013 All Rights Reserved

Published Wednesday, August 21, 2013 5:11 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Wes Brown said:

Wow,

Now that is a bug. Since you say it is present from 2008 to now I'm assuming someone besides yourself verified this on 2005 outside your lab too?

Like you I'm trying to rap my head around this one. Seems trivial and easily reproducible, and I am having a hard time believing that they missed this one. Have you pinged Conor yet?

August 20, 2013 11:21 AM
 

pmbAustin said:

So, you weren't explicit, but is a requirement to surface the bug using partitioned data sets?  I assume so.  If you remove the partitioning, the problem goes away, correct?  Just trying to be absolutely clear about the root cause and the scope of the issue.

August 20, 2013 11:47 AM
 

Paul White said:

Hi Wes,

Yeah it is a bit of shocker IMHO. I haven't had anyone else verify it on 2005 (and I wish I had a 'lab' :) but the partitioning model was completely different in 2005 so it makes intuitive sense to me. More details in the other blog post I linked to at the end there. Let Conor know, not sure how close he is to the optimizer these days, but we'll see.

Cheers,

paul

August 20, 2013 11:53 AM
 

Paul White said:

pmbAustin,

Yes a partitioned table is essential; updated the post to make that clear. More details in the other blog posting, if you're interested.

Paul

August 20, 2013 11:56 AM
 

Wes W said:

Thanks for discovering, documenting, and sharing Paul.

Though you stated this is persistent through 2014 CTP1, I just had to test with 2012 SP1 CU5 (currently latest SP & CU) and also confirmed this bug is present.

Good to know for someone who uses partitioning quite a bit!!!

August 20, 2013 1:43 PM
 

Wes Brown said:

I'll put aside some time this weekend to test. I've got SQL Server 7 through 2014 in my lab and post the results here.

-wes

August 21, 2013 10:38 AM
 

Paul White said:

@WesB

I'll be interested to hear how you get on with testing table partitioning pre-2005 :-D Appreciated anyway, all joking aside.

@WesW

Thanks!

August 21, 2013 10:48 AM
 

Uri Dimant said:

Hi Paul

I have just tested it on SS2005 (SP4) get 999 rows for both queries.

August 25, 2013 4:43 AM
 

manishkumar1980 said:

Thanks Paul. always great now i changed my code to below

Inner hash JOIN dbo.T2 AS T2

   ON T2.T1ID = T1.T1ID

WHERE T1.SomeID = 1234;

Using above code solves the problem.

August 27, 2013 12:27 AM
 

red8rain said:

Hi Paul,

i see that you only used inner join; does this have the same affect if it was a left/right, full, cross join?

August 27, 2013 11:56 PM
 

Paul White said:

The problem is that the optimizer produces a plan that it thinks guarantees a particular row order, but it doesn't.

Using a plan that doesn't rely on that order (e.g. a hash join) won't show a problem. Using something that does rely on the order (like merge join) will. It doesn't matter if the merge join is an inner join or not.

This post is a quick illustration of a couple of the issues that might occur. For all the details, see http://www.sqlperformance.com/2013/08/t-sql-queries/incorrect-results-with-merge-join

August 28, 2013 2:30 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement