THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

The Beatles versus the Stones

Here’s a nice brain teaser, just before the end of the year. Despite the title, it is related to SQL Server, not to music!

 

A common misconception amongst SQL Server users is that a clustered index on a table will ensure that data is returned in the order implied by that index. I have lost count of the number of times I had to disprove this notion.

 

Of course, there are many cases where the rows returned by a query will be in the order of the clustered index. Here’s a quick illustration, using the lineup that The Beatles had during most of the 60s.

 

CREATE TABLE Beatles

     (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

      FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);

INSERT INTO Beatles (LastName, FirstName)

SELECT 'Lennon', 'John'

UNION ALL

SELECT 'McCartney', 'Paul'

UNION ALL

SELECT 'Harrison', 'George'

UNION ALL

SELECT 'Starr', 'Ringo';

SELECT LastName FROM Beatles;

DROP TABLE Beatles;

go

 

LastName

--------------------

Harrison

Lennon

McCartney

Starr

 

The results of this query are in alphabetical order of last name, the column used in the clustered index. Apparently, this is one of the very many cases where the order of the rows is implied by the clustered index, allowing the misconception that this is always the case to spread even further.

 

But an interesting thing happens if I use the exact same table definition to old and query the lineup of that other famous rock group of the 60s:

 

CREATE TABLE Stones

     (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

      FirstName varchar(20) NOT NULL UNIQUE NONCLUSTERED);

INSERT INTO Stones (LastName, FirstName)

SELECT 'Jagger', 'Mick'

UNION ALL

SELECT 'Jones', 'Brian'

UNION ALL

SELECT 'Richards', 'Keith'

UNION ALL

SELECT 'Watts', 'Charlie'

UNION ALL

SELECT 'Wyman', 'Bill';

SELECT LastName FROM Stones;

DROP TABLE Stones;

go

 

LastName

--------------------

Wyman

Jones

Watts

Richards

Jagger

 

In this case, the names are returned in random order. That makes this a great example to really disprove the notion of a clustered index guaranteeing any output order. (Dare I say that we now finally have solid proof that the Stones are better than the Beatles? Or will that make me subject to loads of flames?)

 

What’s intriguing in this case is the difference in behaviour for the two examples. Apart from the table name, the two code snippets are exactly the same – and even renaming the tables won’t change the results. So here’s the brain teaser that I’ll leave you to ponder over your glass of champagne: what is the reason that the Beatles are, but the Stones are not returned in clustered index order?

 

I’ll post the answer in a few days.

 

 

On a more personal note, I want to apologize for not posting any new stuff during the last two months. I still have some good ideas in my scratchpad, but I need some time to polish them up to blog quality – and time is the one thing I have been lacking for the past two months. The bad news is that I will probably be short on time for the next month as well, but things are looking more sunny after that.

 

And with this being my last post of the year, I’ll also grab this opportunity to wish all readers of sqlblog.com a very great 2007, with lots of love and luck in your personal lifes, and lots of interesting SQL challenges and enticing performance gains at work.

Published Sunday, December 31, 2006 1:49 PM by Hugo Kornelis

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Nice one, Hugo -- that is a great example!  Took me a moment to figure out, but it makes perfect sense.  Have a great New Years!

December 31, 2006 10:31 AM
 

Kalen Delaney said:

This is great! I'll admit, I didn't figure it out by just looking at the code. I had to run it, and start doing some of my usual query analysis, but I got it pretty quickly after that. I'd love to use this as an example in my classes, if that's ok with you? It makes me wish I was still a college teacher, and I could actually use this as a pop quiz. ;-)

December 31, 2006 11:18 AM
 

Denis The SQl Menace said:

Oh yes, I remember this thread "insert into tbl1 select * from tbl2order by field1 doesnt work!"

from the microsoft.public.sqlserver.programming newsgroup(http://tinyurl.com/y8umhm)

There were 121 messages in total going back and forth until Hugo set the op straight

Denis

Denis

 ---

- Edited by admin to remove long link -
 

December 31, 2006 12:02 PM
 

AaronBertrand said:

The answer isn't too obvious, unless you've been dealing with this misconception a lot.  :-)  It's a very good, succinct example that I'm sure will be referenced in many such discussions in the future.

December 31, 2006 3:49 PM
 

Dave Markle said:

Awesome.  3 minutes!  Let me pat myself on the back... There.  Done.  I feel a lot better.  

Honestly, I haven't been fighting this misconception much.  I find myself fighting far too many "wtfs" -- subqueries like "(SELECT TOP 100 PERCENT FROM foo ORDER BY bar)".  I am a bit shocked every time I see that one.  And the fact that it usually seems to "work".  And the fact that the SQL 2000 EM query designer generates these sorts of hideous abominations.

January 1, 2007 8:56 AM
 

Hugo Kornelis said:

Kalen: Of course you can use this in your classes. The pop quiz idea is closer than you may think. I had intended to demonstrate these queries at the end of a session I delivered last month and give a prize to the first to correctly explain the behaviour, but I didn't have enough time left :-(  That's when I decided to post it here instead.

Dave: Allow me to add my back-pattings as well. Figuring this out in just 3 minutes is really good! re "TOP 100 PERCENT ... ORDER BY": you'll probably stop complaining about EM's view designer adding this once you find out that SSMS' view designer does the same - even though the "trick" doesn't even work anymore on SQL Server 2005!

January 1, 2007 4:56 PM
 

Dave Markle said:

Wow.  Thanks Hugo!  A truly disturbing revelation indeed.  Since it's so totally off-topic, I did some more investigation of it on my blog (http://executioniseverything.blogspot.com/2007/01/order-by-in-views-sql-server-2005.html).  If you play with the numbers a bit in the TOP clause, it really makes you wonder what the "proper" behavior should be!

January 1, 2007 8:07 PM
 

Hugo Kornelis said:

Hi Dave,

The "proper" behaviour is to use the ORDER BY only to determine which rows are included in the TOP xxx subcollection. A view can't be ordered, by definition, so SQL Server is free to return rows in any order it sees fit.

SSMS should not even allow you to attempt to define an order for a view. I have submitted a bug for this to Connect: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=249248.

January 2, 2007 8:05 AM
 

Denis The SQL Menace said:

Yes it does make sense but I think (some) people will say

No problem I will just add

where lastname > '';

and that fixes that 'problem'

January 2, 2007 10:53 AM
 

Ahmed Charles said:

Well, I thought I'd try to understand why the optimizer does what it does and came up with the following:

CREATE TABLE Stones

    (LastName varchar(20) NOT NULL PRIMARY KEY CLUSTERED,

     FirstName varchar(20),

     LastNameAgain AS CAST(LastName AS char(1597)));

CREATE UNIQUE INDEX IX_FirstName ON Stones(FirstName) INCLUDE (LastNameAgain);

In SQL Server 2005 (9.00.3033.00), the adding the char(1597) column to the nonclustered index, but not the clustered index changes the behavior. A char(1596) column doesn't. I guess this approach is another way to "solve the problem" without requiring that you change all your queries. :)

January 3, 2007 2:10 PM
 

Hugo Kornelis said:

Denis, Ahmed,

For the record - there is no "problem", so nothing needs to be fixed. The query has no ORDER BY, so any ordering in the returned data is correct. My post is just about the strange observation that the Beatles "happen to be" returned in clustered index order, while the Stones are not.

I'll get back to the reasons why Denis' where clause and Ahmed's (very impressive!! - kudos for finding that) computed column change the behaviour of the Stones at a later time. I can't get into that right now without giving away the answer, and I'll postpone that a bit, so that new readers can scratch their heads a bit (and others can continue to tear out what's left of their hair <evil grin>).

January 3, 2007 5:37 PM
 

Ahmed Charles said:

I agree, there is no "problem" (unfortunately sarcasm is fairly hard to express in text). I also agree that people should scratch there heads for awhile, though I'm sure it'll make sense in the end.

January 3, 2007 6:52 PM
 

Chuck Boyce said:

Hugo, my friend,

There's simply no comparison between the Beatles and the Stones.  I love the Stones, btw..."Jumping Jack Flash", "Happy", "Miss You".  The Stones rock, man.

But the Beatles???  Are you kidding me???????

"Dear Prudence"

"Rain"

"Baby, You're a Rich man"

That awesome Rickenbacker John Lennon played?

The Stones are at the top of the 20th century and certainly one of the best bands of the 60's and 70's, but the Beatles are the most significant Pop group in a millennia.

January 4, 2007 5:22 PM
 

Hugo Kornelis said:

Hi Chuck,

Heheh! I expected comments llike yours - in fact, I am surprised I didn't get a big flame war about which band was better within hours of posting <g>.

I won't get into this debate. In fact, I even carefully avoided any indication that either band might be better or worse, or even that either behaviour of any of the tables might be better or worse!

January 4, 2007 6:21 PM
 

ShimMaster said:

Results are being returned in order by the non clustered index.

Remove the non clustered index and the results are returned in the correct order.

Optimizer chooses to do an index scan when clustered index only exists and via non clustered  index when both indexes exist?

Very interesting.

January 16, 2007 10:01 AM
 

Hugo Kornelis said:

Hi ShimMaster,

Congratulations! You figured out the correct answer.

But I must object to the usage of the words "correct order" when refering to a query that has no ORDER BY. All orders are correct - they are only different (or rather, they <b>appear</b> to be different).

I tried to publish a new blog post with a detailed explanations and some follow-up to the observations made by Denis and Ahmed, but experienced some technical difficulties. I'll try again later today.

January 16, 2007 2:21 PM
 

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server' said:

I have just managed to overcome the technical difficulties. The full explanation of this seemingly weird behaviour is now online.

January 16, 2007 3:47 PM
 

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM
 

Pollus said:

Very nice article ! Short and using a nice title !

December 28, 2007 1:12 PM
 

Sanchay said:

Got it 1 minute, but don't know the reason for behavior, why its considering FirstName at all?

March 21, 2013 3:27 AM
 

Maniac said:

This fixes the problem:

SELECT LastName FROM Stones WITH (Index(1));

November 21, 2014 7:55 AM

Leave a Comment

(required) 
(required) 
Submit

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement