I had a very educational exchange the other day on Twitter (also known as “getting schooled”) by my SQL compatriots Argenis Fernandez (@DBArgenis) and Hugo Kornelis (@Hugo_Kornelis). A guy called Chris Switalski (@crswit) asked a really interesting question on #sqlhelp, which I proceeded to answer incorrectly, which led to a correction by my friends, which in turn took me back into the SQL File Layout Viewer, which quickly uncovered a bug there.
So, learn from your friends on Twitter!
Anyway, here’s the question, and corrected demo using an updated SQL File Layout Viewer build.
When you create a clustered index on a heap, the data from the heap is rewritten and ordered, and the upper levels of the index are added. If you drop that clustered index, what happens? Does the set of leaf-level pages stay in place, and the index pages just get unlinked and deallocated? Or is the data rewritten again? Does it stay in the same logical or physical order, even though the ordering requirement is removed?
I had said that I imagine the data is rewritten in both cases, but that we could look and see to be sure. I was totally wrong – the leaf level of a clustered index is made of data pages, and a heap is made of data pages too, as Argenis pointed out. It turns out that, since they are not different structures down at that level, in order to drop a clustered index, the server will indeed just remove the upper levels of the index and leave all the leaf level data pages in place. Cool optimization! It turns out that my original release of the File Layout Viewer doesn’t always display this situation correctly, so I had to make some changes. This demo, though, I believe shows the behavior correctly.
First, we make a test database and create a heap, and populate it with some dummy data:
IF EXISTS ( SELECT 1 FROM sys.databases WHERE name = 'HeapTest' ) DROP DATABASE HeapTest;
CREATE DATABASE HeapTest;
ALTER DATABASE HeapTest SET RECOVERY SIMPLE;
ALTER DATABASE HeapTest MODIFY FILE ( NAME = N'HeapTest', SIZE = 102400KB );
ALTER DATABASE HeapTest MODIFY FILE ( NAME = N'HeapTest_log', SIZE = 10240KB );
CREATE TABLE SampleData (
Data uniqueidentifier DEFAULT NEWID(),
Padding char(100) DEFAULT ( CAST( CHECKSUM( NEWID() ) as CHAR(100) ) )
INSERT SampleData DEFAULT VALUES;
If we run the File Layout Viewer, we see a nice, contiguous set of data pages (type 1) in teal:
Next, we’ll add a NON clustered index, just so we can track what happens to that structure too:
CREATE NONCLUSTERED INDEX Nonclust ON SampleData ( Padding );
That shows up in the next part of the data file, in bright blue:
Now if we make a clustered index on the heap, it will have to be rewritten into another region of the file, so that it can be stored in the right logical order. We should expect it to “move” in the visualization down into the empty part of the file. The non clustered index will also be rewritten, because it will switch from using physical row locations to point to data in the heap to using the clustering key to point to data in the new clustered index. Let’s see what it looks like:
CREATE CLUSTERED INDEX Clust ON SampleData ( Data );
The new clustered index and new version of the NON clustered index are both created in another region of the data file:
You can see, in lighter gray, the regions that were emptied where the two structures had been stored, and the two new structures now in the center of the graphic.
Now for the part of the question I got wrong: what happens when we drop that clustered index?
DROP INDEX SampleData.Clust;
This is really neat: the leaf level of the clustered index (remember data pages are data pages) stays right where it is, and the upper level index pages are just deallocated/unlinked.
The NON clustered index does have to be rebuilt yet again to change those clustered index keys out for row location pointers that point to the rows in the heap, so those pages do get rewritten into yet another region of the file:
So, to summarize:
When you add or remove a clustered index from a table, any NON clustered indexes always have to be rebuilt - and will be automatically - to exchange row pointers for clustering keys in the non clustered index itself. This generally takes care of itself, but it can be a performance issue, or cause longer running transactions or a large amount of logging.
When you CREATE a clustered index on a heap, the data has to be rewritten to enforce the logical order of the index.
When you DROP a clustered index, however, the server can keep all the leaf level pages in place and just strip away the index levels above.