Greg Linwood, a fellow SQL Server MVP, has started a series of articles in which he attempts to prove that having a clustered index on each table is not a good practice. However, he has failed to include the effects of fragmentation into account, so I decided to run some tests for myself. One of those test had rather upsetting results.
First a bit of background, for those of you who have never read a book by Kalen Delaney. A table without clustered index (also known as a “heap”) can be read in two ways: using a bookmark lookup or a table scan.
A bookmark lookup is used when a nonclustered index can first be used to narrow down the number of rows to read. Each entry in the nonclustered index holds a pointer to the row in the heap; this pointer is called the RID (short for Row ID), and consists of a file number, a page number, and a slot number. Using the RID, the storage engine can directly fetch the required page and read the row.
A table scan is used when there is no nonclustered index or when statistics indicate that the nonclustered index is not selective enough. A table scan is driven by the IAM (Index Allocation Map), basically a bitmap that indicates which pages in the database are allocated for this table. Table scans always tend to be slow, because each page in the table has to be read and each row processed. This is offset somewhat by the fact that the pages are read in order of page number, so if there is no file fragmentation, movement of the disk heads will be minimal and SQL Server can use the highly efficient read-ahead technology rather than the usual random I/O. Or at least, that’s what I (and many others) always assumed.
Before moving on to the issue of fragmentation, let’s first see this in action:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.Persons') AND type = N'U')
DROP TABLE dbo.Persons;
go
CREATE TABLE dbo.Persons
(PersonID int NOT NULL IDENTITY,
FName varchar(20) NOT NULL,
LName varchar(30) NOT NULL,
Email varchar(7000) NOT NULL);
go
-- Insert approx. 20,000 rows
INSERT INTO dbo.Persons (FName, LName, Email)
SELECT LEFT(FirstName, 20), LEFT(LastName, 30), EmailAddress
FROM AdventureWorks.Person.Contact;
go
-- Dummy table scan to force creation of statistics
DECLARE @Dummy int;
SELECT @Dummy = COUNT(*)
FROM dbo.Persons
WHERE Email LIKE 'brenda20@adventure-works.com%';
go
-- Completely clear out the cache
CHECKPOINT;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
go
-- Do table scan and show pages read, then show #pages in table.
SET STATISTICS IO ON;
SELECT *
FROM dbo.Persons
WHERE Email LIKE 'brenda20@adventure-works.com%';
SET STATISTICS IO OFF;
DBCC CHECKTABLE('dbo.Persons');
go
In case you are wondering about the “dummy table scan” – on my first tests, I got only logical reads, which was odd considering I had just forced the cache to be cleaned. Fellow MVP Steve Kass found the cause: the first time the Email column is used in a search condition, statistics have to be computed, which is done by scanning the table. The dummy scan ensures that the statistics are created before I clear out the cache.
On my SQL Server 2005 server, I got this output (edited for readability):
PersonID FName LName Email
----------- -------------------- ------------------------------ -----------------------------
14575 Brenda Lopez brenda20@adventure-works.com
Table 'Persons'. Scan count 1, logical reads 149, physical reads 0, read-ahead reads 165, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
DBCC results for 'Persons'.
There are 19972 rows in 149 pages for object "Persons".
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
There are 149 pages used by the table, and 149 pages are read. This is as expected. I must admit that I don’t know why there are 16 extra read-ahead reads, though.
Updates of rows with variable-length columns are one of the main causes of fragmentation in a heap. SQL Server will try to perform an update in place. This is always possible if the new data occupies less or the same number of bytes as the old data, but if the new data needs more room and there’s not enough space left in the page, the row has to move. SQL Server finds a page with enough free space or allocates a new page, then stores the new data of the row in that page. The data at the original location is replaced with a so-called “forwarding pointer”: a RID that points to the new location of the row. Because of this forwarding pointer, there is no need to change RID pointers to the row in existing nonclustered indexes; when trying to read this row, they can simply follow the forwarding pointer to its new location. And at the new location, a “back pointer” to the original location is added, so that if the row has to move again, the original forwarding pointer can be updated rather than forming an endless chain of forwarding pointers.
Run this code now to simulate how a table might get fragmented over time as its contents are regularly updated. Note that this code does 5,000 random updates; one out of four rows on average will be updated, many rows will never be touched, and a few might even be updated twice or more.
-- Simulate update activity to create fragmentation.
-- Temporarily add index to improve performance.
CREATE UNIQUE NONCLUSTERED INDEX ix_PersonID ON dbo.Persons(PersonID);
-- Do 5,000 random updates to email column, increasing it's length.
DECLARE @cnt int;
SET @cnt = 1;
WHILE @cnt < 5000
BEGIN;
SET @cnt = @cnt + 1;
UPDATE dbo.Persons
SET Email = LEFT(Email + Email, 7000)
WHERE PersonID = CAST(RAND() * 20000 AS int);
END;
-- Drop the index, we no longer need it.
DROP INDEX dbo.Persons.ix_PersonID;
go
There now should be plenty of forwarding pointers in the table. This means that many bookmark lookups will now take two reads instead of just one. But how are table scans affected by these forwarding pointers? I see two possible methods:
1. Follow the IAM from start to finish. Ignore forwarding pointers. Process all rows, both with and without back pointers. Processing is still completely sequential.
2. Follow the IAM from start to finish. When encountering a forwarding pointer, immediately fetch the page that the pointer points to and process that row. When encountering a row with a back pointer, ignore it (it is already or will be processed when the corresponding forwarding pointer is read).
If the table is read WITH (NOLOCK) or in TRANSACTION ISOLATION LEVEL READ UNCOMMITTED, the second method has to be used. Otherwise, a row that is moved during the scan might be included twice or not at all (Paul Randal explains this in a bit more detail in his post about heaps). But in all other cases, method 1 can safely be used since the table lock issued by the table scan effectively prevents any modification while the scan is running.
So let’s see what method SQL Server picks:
-- Do table scan and show pages read, then show #pages in table.
CHECKPOINT;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SET STATISTICS IO ON;
SELECT *
FROM dbo.Persons
WHERE Email LIKE 'brenda20@adventure-works.com%';
SET STATISTICS IO OFF;
DBCC CHECKTABLE('dbo.Persons');
go
The relevant part of the output on my system (note that numbers vary slightly between runs, due to the use of RAND() in the simulated updates):
Table 'Persons'. Scan count 1, logical reads 1892, physical reads 0, read-ahead reads 222, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
DBCC results for 'Persons'.
There are 19972 rows in 174 pages for object "Persons".
The number of pages occupied by the table has grown a fair bit, but that was to be expected after doubling 5,000 email addresses. The troubling part of the output is the number of logical reads. Apparently, the average page in the table is read 10 times! This shows that, even though the default READ COMMITTED isolation level makes it unnecessary, SQL Server will follow method 1.
Reading the same page will of course significantly impact performance. This gets even worse when the size of the table exceeds the available server memory so that not all pages fit in the data cache. Since the order in which pages are accessed will be completely random (for instance, the first page holds forwarding pointers to pages 6632, 91, 93055, 17, 494, and 220568; the second page holds forwarding pointers to pages 902, 14, 6632, 1905408, and 3, etc.), pages requested will probably just have been removed from cache so that they have to be read from disk once more. Because of the forwarding pointers (and an apparent oversight from Microsoft), a table scan on a heap might come with an amount of physical I/O that is a multiple of the actual size of the table. And there is no easy way to fix or prevent this, since heaps (unlike table scans) can not be unfragmented. The only option is to periodically copy over all data to a new table, as that is the only way to get rid of the fragmentation.
Bottom line: Beware of table scans, and beware of the effects of fragmentation on a heap. Even though I agree with Greg that some tables don’t need a clustered index, those are the exception. So if you choose to use rules of thumb, make sure that having a clustered index for each table is one of them. It might not be optimal, but it’s a lot better than not having a clustered index for any table!