I know it's been a while since I had a technical post, and I've got 3 of them started and saved as drafts. I keep waiting for spare time. Now that PASS is over, and I've crawled partway out from under the TO-DO pile that accumulated while I was preparing for PASS, I'm going to pull one of the drafts and and get to work on it.
After I had posted this, I found that I had written a blog post on almost the exact same topic about a year and half ago! It uses a different example, and the last detail showing a way to remove forwarded records in SQL Server 2008 was not mentioned back then. So since I just spent many hours working on this, I decided to post it anyway.
I received an email a few weeks ago regarding using ALTER TABLE to defrag a heap, and my immediate thought was that it did no such thing, because by definition, a heap cannot be fragmented. SQL Server defrag operation is only concerned with logical fragmentation, which occurs when the logical and physical order do not match. And as you should know, heaps have no logical order. Heaps are just a bunch of rows.
In an index, there is an order, based on the index key(s). So suppose you have an index on LastName. SQL Server guarantees the leaf level will have all your LastName values in logical order, but does not guarantee a physical order. Logical order means that SQL Server can easily access the data in order, by following pointers. The page with "Abbott" and "Anderson" will point to the page with "Barkley" and "Bollinger", which will point to the page with "Charleston" and "Chung". So we can access the data in A-B-C order. When you first create the index, the physical order will be as close as possible to logical order, so the A's might be on page 112 and the B's on 113 and the C's on 114. However, as you add, update or remove data, the physical order may not be maintained. Pages may have to be split, so if the B page gets full, a new one may be allocated at page 358, where half the B's will move to. Then the first B page, 113, will point to the second B page at 358, and 358 will point to the C page, 114. Page splits due to full pages are the main cause of logical fragmentation.
When you have a heap, there is no logical ordering and there is no splitting of pages.
But I thought a bit more, and realized that there are some other issues that one might consider to be 'fragmentation' in a heap. One issue might be whether the extents belonging to the table are contiguous or not. In SQL 2000, you could use DBCC SHOWCONTIG to look at a value called "Extent Scan Fragmentation" to see how contiguous the pages were, and in SQL 2005 and 2008, when you look at sys.dm_db_index_physical_stats, a high value number for fragment_count or a low value for avg_fragment_size_in_pages can indicate lack of contiguousness. (And I know, the BOL page actually talks about fragmentation for heap, but since the definition seems to suggest an ordering, mentioning a next and previous page, I tent to ignore the issue.) But whether contiguousness is a good thing or a bad thing is not what I'm going to discuss here.
Also, whether fragmentation itself is something you have to worry about is not something I'm going to discuss here. What I really want to tell you about is something that can occur in heaps that can be MUCH MUCH worse than fragmentation, giving worse performance for a table scan than the most fragmented table you can imagine. I'm talking about forwarding pointers. A forwarding pointer appears when you update one or more variable length columns in a row in a heap, and increase the size of the row so it no longer fits on the original page. SQL Server will not split the page, as splitting never happens in heaps. Instead, the new enlarged row is moved to another page, and a small forwarding pointer is left behind. The forwarding pointer basically is just the file, page and row number address of the new location. The enlarged row at the new location is called a forwarded record, and has information in the row header that indicates the row has been forwarded, and it also includes a back pointer to the original page.
Let’s look at an example to see these forwarding pointers. I'll create a table with two variable length columns. After I populate the table with five rows, which will fill the page, I’ll update one of the rows to make its third column much longer. The row will no longer fit on the original page and will have to move. I can use DBCC IND to get the page numbers used by the table and DBCC PAGE to look at the full page.
USE tempdb;
GO
CREATE TABLE bigrows
( a int IDENTITY ,
b varchar(1600),
c varchar(1600));
GO
INSERT INTO bigrows
VALUES (REPLICATE('a', 1600), '');
INSERT INTO bigrows
VALUES (REPLICATE('b', 1600), '');
INSERT INTO bigrows
VALUES (REPLICATE('c', 1600), '');
INSERT INTO bigrows
VALUES (REPLICATE('d', 1600), '');
INSERT INTO bigrows
VALUES (REPLICATE('e', 1600), '');
GO
Now let's look at the header for this page. DBCC IND will give us the page number:
DBCC IND (tempdb, bigrows, -1);
-- Note the FileID and PageID from the rows where PageType = 1
-- and use those values with DBCC PAGE (I got FileID 1 and PageID 164)
DBCC TRACEON(3604);
GO
DBCC PAGE(tempdb, 1, 164, 1);
GO
I'm not going to step through all the output to DBCC PAGE, but note the value in the header m_freeCnt = 11. This means there are only 11 free bytes on the entire page. Pretty full, right?
Now let's update one of the rows. I'll change column c to be 1600 bytes instead of 0, and since there are only 11 free bytes on the page, the row will be WAY too big.
UPDATE bigrows
SET c = REPLICATE('x', 1600)
WHERE a = 3;
GO
If you look at DBCC IND again, you'll now see there is a second data page for the table. I'll let you explore the contents of the new page on your own, but I will show you what is on the original page where the forwarded row used to be:
Slot 2, Offset 0x1feb, Length 9, DumpStyle BYTE
Record Type = FORWARDING_STUB Record Attributes =
Memory Dump @0x61ADDFEB
00000000: 04ea1100 00010000 00†††††††††††††††††.........
Note that there are only 9 bytes total, and the record type indicates the row has moved, because the type is FORWARDING_STUB, which means it has a forwarding pointer.
I told you that forwarding pointers can be a really bad thing. Let's see why that is. I'm going to build a much bigger table now, that is a copy of a big table in AdventureWorks. I'll then ALTER the table to add a new column and fill it with 100 bytes in every single row of the table.
USE AdventureWorks
GO
IF EXISTS (SELECT 1 FROM sys.tables
WHERE name = 'Details')
DROP TABLE Details;
GO
--Look at the fragmentation data for the SalesOrderDetail2 table.
-- Note the avg_fragmentation_in_percent value
CREATE TABLE dbo.Details
([SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL);
GO
INSERT INTO dbo.Details
SELECT [SalesOrderID]
,[SalesOrderDetailID]
,[CarrierTrackingNumber]
,[OrderQty]
,[ProductID]
,[SpecialOfferID]
,[UnitPrice]
,[UnitPriceDiscount]
FROM [AdventureWorks].[Sales].[SalesOrderDetail];
GO
-- sys.dm_db_index_physical_stats shows us the fullness of the pages in avg_page_space_used_in_percent,
-- and it also returns a value showing the number of forwarded records.
-- Note that pages are very full, but there are no forwarded records yet.
-- 856 pages
SELECT object_name(object_id) as Object, page_count, avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
object_id ('dbo.Details'), null, null, 'DETAILED');
GO
SET STATISTICS IO ON
GO
-- A table scan takes as many reads as there are pages, i.e. 856 in this case
SELECT * FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO
-- Now add a new fixed width column and note that this is a
-- metadata only change
-- The data pages are not modified
-- There is no change in the fullness of the pages
ALTER TABLE dbo.Details
ADD notes CHAR(100);
GO
SELECT object_name(object_id) as Object, page_count, avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
object_id ('dbo.Details'), null, null, 'DETAILED');
GO
-- The data pages are not affected until we run the following update.
-- Every row on every page will get an additional 100 bytes in the notes field
-- added to it
UPDATE dbo.Details
SET notes = 'notes';
GO
-- note there are LOTS of forwarded records now (84408),
-- and many more pages the table (2709)
SELECT object_name(object_id) as Object, page_count, avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
object_id ('dbo.Details'), null, null, 'DETAILED');
GO
SET STATISTICS IO ON
GO
-- The number of reads is not just the number of pages as we would expect for
-- a scan of a heap, but is equal to the
-- number of pages PLUS the number of forwarded records:
-- 84408 + 2709 = 87117
-- During a scan, the forwarded pointers are followed for EACH row, and then
-- SQL Server goes back to the original position to continue the scan
SELECT * FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO
So if you find a lot of forwarded records in a heap that is scanned frequently, you'll want to get rid of them. There is no background process that goes through and cleans up forwarding pointers for you. However, if a rows is updated to reduce the length of the variable length column and it becomes small enough to fit back in the original position, it will move back (assuming other rows haven't filled the space the row used to occupy.) Also, if you shrink a data file, forwarding pointers can be removed, but this is not recommended as a way to remove forwarding pointers, it is just a side-effect.
Since forwarding pointers can only occur in heaps, you can get rid of them by creating a clustered index on the table.
In addition, SQL Server 2008 allows us the option to rebuild tables. The motivation for this feature is to allow us to change the compression setting for a table, but we can use the REBUILD option without defining compression. So I can remove all forwarding pointers from my table as follows:
ALTER TABLE dbo.Details REBUILD;
GO
SELECT object_name(object_id) as Object, page_count, avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
object_id ('dbo.Details'), null, null, 'DETAILED');
GO
Notice that I still have a much larger number of pages than I did originally (2379 instead of 811) but, after all, the rows are all much larger! More importantly, there are NO forwarded records. If I do another table scan, the number of logical reads should be only the same as the number of pages.
SET STATISTICS IO ON
GO
SELECT * FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO
I am recommending to my clients that any scripts that check for fragmentation values for the purpose of rebuilding an index, also inspect the forwarded record count for heaps.
~Kalen