THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Clustered or Nonclustered? Why not both?

I had been thinking of this post all day, and then noticed that Denis wrote a post with almost the same name. I was worried he might have written about something similar, but it turns out not to be the case.

A group of colleagues have been having a discussion about this topic recently, which was spurred by the fact that the  Microsoft supplied Northwind database has duplicates of many of its indexes. If Microsoft does this, many new database users might assume it's a good idea. It's not! Having two identical indexes gives you no additional benefit, but does give you lots of additional overhead when those indexes need to be maintained. In SQL Server 2008, you will be able to create a policy to detect and prohibit this behavior if you choose. But that of course, is still in the future.

But what if the indexes are not quite identical? What if one is a clustered index and one is nonclustered? My colleagues report seeing this behavior frequently when someone declares a Primary Key on a column, and then doesn't realize that automatically builds an index (clustered by default), so she then builds a nonclustered on the same key.  Is this necessarily all bad? The clustered index is useful when most of the columns of many of the rows need to be returned, based on the value in the PK column, or when the data needs to be returned sorted by the PK. But what if you only need a few rows, and you need only key columns? For example, what about a count(*) query?

In general, when satisfying a count(*) query, the SQL Server optimizer will choose the index with the smallest number of pages at the leaf level. A nonclustered index typically will have far fewer leaf level rows that a clustered, but still contains an entry for every single row, so the count(*) value will be accurate. So consider this example:

USE AdventureWorks;
-- create a big table by copying another one
IF EXISTS (SELECT * FROM sys.tables WHERE name = 'newdetails')
     DROP TABLE newdetails;
SELECT * INTO newdetails FROM Sales.SalesOrderDetail;

-- Build a PK and a NC index on the same column
ALTER TABLE newdetails
         ADD CONSTRAINT PK_Detail PRIMARY KEY (SalesOrderDetailID);
CREATE UNIQUE INDEX UNQ_Detail ON newdetails (SalesOrderDetailID);

-- Look at the plan; the optimizer will choose the NC index
SET SHOWPLAN_TEXT ON;
SELECT count(*) FROM newdetails
WHERE SalesOrderDetailID BETWEEN 1000 and 2000;
SET SHOWPLAN_TEXT OFF;

-- Look at the performance; the nonclustered is performing better
-- Even though the NC index will be chosen without the hint,
--   i included it here to make it more obvious which index is chosen
SET STATISTICS IO ON;
SELECT count(*) FROM newdetails WITH (INDEX = UNQ_Detail)
WHERE SalesOrderDetailID BETWEEN 1000 and 2000
SELECT count(*) FROM newdetails WITH (INDEX = PK_Detail)
WHERE SalesOrderDetailID BETWEEN 1000 and 2000
SET STATISTICS IO OFF;

I am absolutely not recommending that you always duplicate your PK index with another one, but rather illustrating that it is not always completely pointless or harmful to do so. Choosing the best indexes takes a lot of careful consideration of your data distribution and your query usage patterns, and there is no one-size-fits-all answer.

I hope this is useful to you,

~Kalen

Published Thursday, April 24, 2008 3:51 PM by Kalen Delaney
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Greg Linwood said:

It is definitely useful at times to have a NCIX version of the keys in a CIX. You've provided one example; another very common one is where tables are joined together via the keys without accessing other columns and the physical join operator is merge. Range scanning NCIX keys is much more efficient than range scanning CIX keys as no other columns are in the NCIX leaf pages so the scan requires less logical reads / IO. This is quite common in queries that are filtering via inner join but only returning columns from one table, eg:

select t1.*

from t1

inner join t2 on t1.key = t2.key

Duplicate NCIXs on the other hand are completely useless & do nothing but harm - hence I started a recent MVP private forum discussion to encourage MS to disallow this in a future release of SQL Server.

Not only do duplicate NCIXs cause performance degradation, they also waste disk space & needlessly increase backup time. They are also a cause of dealocking.

A connect item was registered last year & has a few votes already. I'd encourage others to add their votes as well:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=313227&wa=wsignin1.0

April 24, 2008 7:13 PM
 

Steve Lindsay said:

I am getting out of the SQL Server 2005 Index tuning wizard a NCIX when there is a CIX on the same column and it is also the primary key.  Any thoughts on this?  

Steve

July 14, 2008 4:16 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement