<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'indexes'</title><link>http://www2.sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=indexes&amp;orTags=0</link><description>Search results matching tag 'indexes'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>March Update to Rules-Driven Maintenance</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2013/03/11/march-update-to-rules-driven-maintenance.aspx</link><pubDate>Tue, 12 Mar 2013 02:56:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48191</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;This month I have a minor update to the Rules-Driven Maintenance code I originally &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;posted&lt;/a&gt; back in August 2012. This update has just two enhancements, but they are nice ones, I think:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Much improved handling for instances that use database snapshots.&lt;/li&gt;    &lt;li&gt;Ability to do intra-day differential backups purely by setting maintenance times and limits in the policy table.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The code posted here is cumulative, and replaces entirely the original code, but please refer back to &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;that original blog post&lt;/a&gt; for description, instructions and details. This update can be installed right over an existing deployment, or be installed all by itself as a complete solution but – as always – please test and be cautious.&lt;/p&gt;  &lt;p&gt;Enjoy!&lt;/p&gt;</description></item><item><title>Public Release, SQL Server File Layout Viewer</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2013/03/01/public-release-sql-server-file-layout-viewer.aspx</link><pubDate>Fri, 01 Mar 2013 21:36:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47991</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;h2&gt;Version 1.0 is Now Available!&lt;/h2&gt;  &lt;p&gt;I’ve been working off and on, as my real job permits, on this visualization tool for SQL Server data files. This is an educational or exploratory tool where you can more readily &lt;i&gt;see&lt;/i&gt; how the individual data pages in MDF/NDF files are organized, where your tables and indexes live, what effect operations like index rebuild or index reorganize have on the physical layout of the data pages.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewerR1_6399E49C.png"&gt;&lt;img title="FileLayoutViewerR1" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewerR1" width="1028" height="494" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewerR1_thumb_228B6538.png"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The viewer will scan a whole database, using only SQL and DBCC commands, and will render a color-coded representation of all the data pages represented in colored bands. Each partition of each index or heap in the database is assigned a color, so that you can see where all the bits and pieces of an object are located in the files. Above the colored bands there are grayscale or white pixels that show the page type in SQL Server (most are white, which are data pages. Unused/empty regions of the file show as gray). In the image above, for example, all the bright green areas are one index, all the purple areas are one index, and so on.&lt;/p&gt;  &lt;p&gt;There is mouse-over functionality. If you move the mouse cursor over the graph, then details about each page populate the text fields at right, including the object and index the page belongs to, the page type, whether the page represents a fragment, where the previous and next pages are for the same object, etc.&lt;/p&gt;  &lt;h2&gt;Why?&lt;/h2&gt;  &lt;p&gt;Why create something like this? I am a visual person, and I have a theory that many issues we have in computing come down to not being able to see what’s going on. This is especially true as we learn about unfamiliar technology – we have to develop a mental model of structures like B-trees or linked lists or files in order to understand what’s happening. I hope this tool, combined with other knowledge, will help people form an accurate understanding of how data file internals work in SQL Server, faster than working purely in the abstract with tools like DBCC Page or DBCC Ind.&lt;/p&gt;  &lt;h2&gt;Instructions&lt;/h2&gt;  &lt;ol&gt;   &lt;li&gt;Download the tool and unzip it. The package includes both an executable and the source code. If you don’t want the source, the .exe file is a standalone program and will run all on its own, so you are welcome to discard the source folder.&lt;/li&gt;    &lt;li&gt;Validate you have the required prerequisites from the Prereq’s section below.&lt;/li&gt;    &lt;li&gt;Locate a non-production/test database to analyze. The database can be local or on a remote server. I suggest something of a reasonable size, because scanning a really huge data set can take quite a long time.&lt;/li&gt;    &lt;li&gt;Run SQLFileLayoutViewer.exe and select a database to scan. If the database is on a remote server, type the SQL Server name/instance name into the dialog.&lt;/li&gt;    &lt;li&gt;Click Analyze.&lt;/li&gt;    &lt;li&gt;Examine the resulting graph, and mouse over it with the cursor to view detailed information about each page.&lt;/li&gt; &lt;/ol&gt;  &lt;h2&gt;Disclaimer&lt;/h2&gt;  &lt;p&gt;This is a freeware tool provided for your fun, education and entertainment. However, there is no warranty of any kind and you use it at your sole risk. The tool is free but offered under the GNU General Public License 3. If successful, and people are interested, I’ll move this work to some sort of open source project.&lt;/p&gt;  &lt;h2&gt;Prerequisites&lt;/h2&gt;  &lt;p&gt;The app requires .NET Framework 4.0 and the SQL Server management tools. I’ve tested it on Windows 7, Windows Server 2008 R2 and Windows 8. It can be run against a database on a local or remote SQL instance. I believe it will work on any database in SQL Server 2005 or later, but have not tested every possible scenario.&lt;/p&gt;  &lt;h2&gt;Risks?&lt;/h2&gt;  &lt;p&gt;I believe this tool to be relatively risk free, but I would avoid running it against live production data. The tool’s data collection is simple: it will issue a few system table selects to get things like object names, and then it will execute a DBCC PAGE statement against every page in the database. All other processing after that is done locally in the application itself. It does not modify the database.&lt;/p&gt;  &lt;h2&gt;Bugs?&lt;/h2&gt;  &lt;p&gt;I would love to hear about bugs you come across, or additional features you think would be valuable. Please contact me through this site. Note that I am a DBA first, and an amateur .NET developer a distant second, so please be gentle.&lt;/p&gt;  &lt;p&gt;Enjoy!&lt;/p&gt;</description></item><item><title>Geek City: Join With Me!</title><link>http://www2.sqlblog.com/blogs/kalen_delaney/archive/2013/02/14/index-intersection.aspx</link><pubDate>Thu, 14 Feb 2013 22:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47727</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;p&gt;I remember one of the most surprising changes in SQL Server &lt;strike&gt;2000&lt;/strike&gt; 2005 was how the graphical plans showed the use of a nonclustered index to seek, and the plan included something that looked like a JOIN to find rows in the base table.&amp;nbsp; Here’s an example. Although I used SQL Server 2008R2, the graphical plan will be pretty similar to what it looked like back in SQL Server &lt;strike&gt;2000&lt;/strike&gt; 2005. My code will make a copy of a table in the &lt;em&gt;AdventureWorks2008&lt;/em&gt; database, and then build an index on one of the columns.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;USE AdventureWorks2008;        &lt;br&gt;GO&lt;br&gt;IF object_id('dbo.Sales') IS NOT NULL         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DROP TABLE dbo.Sales;         &lt;br&gt;GO         &lt;br&gt;SELECT * INTO dbo.Sales FROM Sales.SalesOrderHeader;         &lt;br&gt;GO         &lt;br&gt;CREATE INDEX Sales_SalesPersonID_index on dbo.Sales(SalesPersonID);         &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now look at the estimated graphical execution plan for this query, that searches for a particular value for the &lt;em&gt;SalesPersonID&lt;/em&gt; column:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE SalesPersonID = 280;         &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;You should see something like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_0F9109CC.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_47634DEF.png" width="507" height="260"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;It shows a JOIN even though there is only one table involved! What’s with that? It took me a few minutes to figure out what I was seeing, and I must admit that over the years since then I have come to appreciate this graphical representation. You’ll have a problem if you think of a JOIN as only finding matching rows between two tables, but can be solved if you realize that a JOIN can be used when finding matches between any two ROWSETS, i.e. any two sets of rows. There is a set of rows in the nonclustered index on &lt;em&gt;SalesPersonID&lt;/em&gt;, and a set of rows in the table. The index seek finds all the index rows with a &lt;em&gt;SalesPersonID&lt;/em&gt; value of 280, but those index rows only contain an index key value and a row pointer (RowID or RID). Because the query is requesting all the columns in the table, not just the &lt;em&gt;SalesPersonID&lt;/em&gt; value,&amp;nbsp; SQL Server must find the rows in the&lt;em&gt; dbo.Sales&lt;/em&gt; table that have a matching value for the RowID. Internally, it is doing a join, even though there is only one table. It is joining index rows with table rows.&lt;/p&gt;  &lt;p&gt;If the table had a clustered index, the plan would look almost exactly the same, with one difference. In SQL Server, if a table has a clustered index, nonclustered index rows do not contain RowIDs, they contain the clustered key value to be used a pointer to the row. SQL Server can take this clustered key from the nonclustered index row and look up in the clustered index to find the desired row. So I create a clustered index on &lt;em&gt;SalesOrderNumber&lt;/em&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;CREATE UNIQUE CLUSTERED INDEX Sales_ident on dbo.Sales(SalesOrderNumber);        &lt;br&gt;GO&lt;/font&gt;       &lt;br&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now when I run the same query as above, I get the following plan:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_6D58FE45.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_7EC95F1D.png" width="513" height="237"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The only difference in the two plans above is the icon for the lookup into the base table.&amp;nbsp; One is looking up into a heap using a RowID, which is an address containing the File ID, the page number and the slot or row number on the page. The other icon represents taking a key value from the nonclustered index row and looking it up by seeking through the clustered index. If you look again at these two icons, which one looks like a picture of directly addressing a row, and which looks like a picture of following another index into the table itself?&lt;/p&gt;    &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_7DF0F933.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_4B20F2BF.png" width="220" height="88"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The icon on the left is used for a RID lookup, and the icon on the right is used for a clustered index key lookup. I know that nobody asked me, but I think they got these backwards.&lt;/p&gt;  &lt;p&gt;But wait, there’s more… &lt;/p&gt;  &lt;p&gt;Prior to SQL Server 7, there was basically a rule of thumb with query plans that SQL Server would use one index (at most) per table per query. It could never (well, hardly ever) use two indexes on the same table in a single query. That changed in SQL Server 7 when a new algorithm called Index Intersection was introduced. &lt;/p&gt;  &lt;p&gt;I’m going to build another index on &lt;em&gt;SalesOrderDate&lt;/em&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;CREATE INDEX Sales_OrderDate_index on dbo.Sales(OrderDate);        &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This index will be used if I look for all orders place on July 1, 2002. The query plan for the following query is shown:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;     &lt;br&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE OrderDate = '2002-07-01';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_4A488CD5.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_09A64066.png" width="491" height="238"&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;We saw that the index on &lt;em&gt;SalesPersonID&lt;/em&gt; is useful when looking for the value 280, and the index on &lt;em&gt;SalesOrderDate&lt;/em&gt; is useful when looking for '2002-07-01'. But what if one query looks for rows that satisfy both conditions? &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales       &lt;br&gt;WHERE OrderDate = '2002-07-01'         &lt;br&gt;AND SalesPersonID = 280&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here’s the plan:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_68B2CDBE.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_07F57492.png" width="573" height="239"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;We see two JOINs ! First, SQL Server finds the sets of index rows meeting each of the WHERE filter conditions and those two index row sets are joined together, using the clustered key value as the join column. Then only after SQL Server finds rows that meets both conditions, it takes the common clustered index key and uses it to find the row in the underlying table. &lt;/p&gt;  &lt;p&gt;But wait, there’s more!&lt;/p&gt;  &lt;p&gt;Both of these queries use a table scan, because the filter conditions are not selective enough. There are way more rows with a &lt;em&gt;SalesPersonID&lt;/em&gt; value of 289 than there were for a value of 280, and there are more rows with an &lt;em&gt;OrderDate&lt;/em&gt; of March 1, 2004 than there are with an &lt;em&gt;OrderDate&lt;/em&gt; of July 1, 2002.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE OrderDate = '2004-03-01' ;        &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE SalesPersonID = 289;         &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;    &lt;p&gt;Both of the above queries show the same plan. Because the table now has a clustered index, the plan shows a clustered index scan, but it is exactly the same work as a table scan:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_18F9A275.png"&gt;&lt;img style="border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_787262C2.png" width="368" height="98"&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;However, if you look at the plan for a query that uses both filter conditions, we don’t get a clustered index scan.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE OrderDate = '2004-03-01'         &lt;br&gt;AND SalesPersonID = 289;&lt;/font&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_65BD690B.png"&gt;&lt;img style="border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_44C9F664.png" width="568" height="243"&gt;&lt;/a&gt;&lt;/p&gt;      &lt;p&gt;Even though each filter alone is not very restrictive, they can both be used for index intersection. The reason SQL Server usually doesn’t choose index seeks on non-restrictive filters is because the cost of then accessing the base table for all the qualifying rows can be very high. But even though each individual index returns a lot of rows, the optimizer’s estimate of the number of rows that meet BOTH conditions is very small, and therefore the number of lookups into the base table is quite small and affordable. &lt;/p&gt;  &lt;p&gt;So, what can you learn? Just because you think an column might not be selective enough to build a nonclustered index on, don’t discount it.&amp;nbsp; If that column will frequently be used in queries with multiple conditions, it just might turn out to be more useful than you thought! &lt;/p&gt;  &lt;p&gt;So make sure you test many possible choices of indexes and combinations of indexes before you know what the best choices are.&lt;/p&gt;  &lt;p&gt;Have fun!&lt;/p&gt;  &lt;p&gt;&lt;font color="#8064a2" size="4"&gt;&lt;strong&gt;~Kalen&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout III</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2013/01/29/visualizing-data-file-layout-iii.aspx</link><pubDate>Tue, 29 Jan 2013 05:45:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47372</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;This is part three of a blog series illustrating a method to render the file structure of a SQL Server database into a graphic visualization.&lt;/p&gt;  &lt;p&gt;Previous Installments:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx"&gt;Part 1&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/23/visualizing-data-file-layout-ii.aspx"&gt;Part 2&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Those that have been reading this series might be be thinking, “Is he going to go there?” Well, the answer is “Yes.” This is the &lt;strong&gt;GUID clustered index post&lt;/strong&gt; that had to be. It’s inevitable with this tool.&lt;/p&gt;  &lt;p&gt;If you follow SQL Server at all, you are probably aware of the &lt;a href="http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/"&gt;long-standing&lt;/a&gt; &lt;a href="http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html"&gt;debate&lt;/a&gt; about whether it is wise, desirable, smart, useful, or what have you, to identify rows using GUIDs. I won’t take a position on that, but I will show here, I hope objectively, a few things that the visualizer shows about file layout vs. distributed inserts, distributed inserts being one of the main challenges around using GUIDs as clustering keys. Just to recap the argument very, very briefly:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;GUID keys can be generated at the client, which saves a round-trip to the database server to create a collection of related rows.&lt;/p&gt;  &lt;p&gt;GUID keys can make certain architectures like sharding, or peer to peer replication, or merging multiple source databases, simpler.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;GUID keys are wider, therefore they take more space in memory and on disk. The additional space is multiplied by their presence in both clustered and non-clustered indexes if they are a clustering key.&lt;/p&gt;  &lt;p&gt;GUID keys don’t only take more space in RAM and on disk because of their width. They also cause &lt;em&gt;distributed inserts&lt;/em&gt; into the clustered index – that is, new rows are added to any and all pages in the index. Each time a row has to be added, the target page must be read into memory, and at a checkpoint, the &lt;em&gt;whole&lt;/em&gt; changed page (both existing and new rows) must be written to disk. This has two effects: &lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The amount of RAM and disk IO required for inserts is probably much higher, as pages with &lt;em&gt;existing&lt;/em&gt; data must come into cache, get changed, and then be written back out again. Essentially, large parts of the table have to be &lt;em&gt;rewritten&lt;/em&gt; to disk to append rows to pages that have data already.&lt;/li&gt;    &lt;li&gt;The pages that store the index will individually fill up, and have to split such that half the existing rows are written back out to the “old” page and half written out to a “new” page in a different location on disk. This causes the pages to be less full, the same number of rows to require more space on disk and in RAM, and the resulting index to be massively fragmented on disk.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;I am not writing to argue these points, which have I think been established by both sides of the debate, only to see if the visualizer shows these effects clearly. Most of the argument isn’t actually about these facts (they are all true, as far as I know) but rather which are more important, and I think that is the main source of debate on the issue.&lt;/p&gt;  &lt;h2&gt;Visual Example of Distributed Inserts&lt;/h2&gt;  &lt;p&gt;It’s very easy to create an example of this with a small sample database. I created one called “VizDemo2.” VizDemo2 has a slightly modified structure to illustrate what’s going on here – I need two tables that are stored separately on disk, so that they cannot interfere with one another. The simplest way to do that is with a couple of file groups containing one file each. So here’s the structure:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;I created the database with a 50MB, single file, Primary file group&lt;/li&gt;    &lt;li&gt;I added a file group FG1 with one 75MB file&lt;/li&gt;    &lt;li&gt;I added a second file group FG2 with one 75MB file&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;When the database is empty, the visualizer shows only the system pages at the start of each file, as shown here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_01_5BB678D6.png"&gt;&lt;img title="VizDemo2_01" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_01" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_01_thumb_18DA345C.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;To that database I added two sample tables identical in structure but with different clustering keys:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE &lt;/span&gt;VizDemo2
&lt;span style="color:blue;"&gt;GO

CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersInt  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;int identity&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;1&lt;span style="color:gray;"&gt;, &lt;/span&gt;1 &lt;span style="color:gray;"&gt;) NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;[FG1]&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;span style="color:blue;"&gt;
CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;uniqueidentifier &lt;/span&gt;&lt;span style="color:gray;"&gt;NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED DEFAULT &lt;/span&gt;&lt;span style="color:magenta;"&gt;NEWID&lt;/span&gt;&lt;span style="color:gray;"&gt;(), 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;)&lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;[FG2]&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;I’ll populate the two tables and we can see what the file layout looks like afterward:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersInt &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;20000&lt;/pre&gt;


&lt;h2&gt;Compare&lt;/h2&gt;

&lt;p&gt;After inserts, the resulting graphic does show some facts we know to be true:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_02_6193AD20.png"&gt;&lt;img title="VizDemo2_02" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_02" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_02_thumb_7A25DB21.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, the data in the integer-clustered index takes about eight bands of the diagram, while storing the same data in a GUID clustered index has required about twelve bands of data pages. The database itself supports that impression with space allocation – it reports these figures:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_03_50B199E1.png"&gt;&lt;img title="VizDemo2_03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_03" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_03_thumb_49262A74.png" width="737" height="75" /&gt;&lt;/a&gt;&lt;/p&gt;





&lt;p&gt;Part of the extra space required is the width of the key, but part of it is the empty space on each page resulting from page splits. If a page that needs a new row is too full, then half the rows from that page are moved to a net-new page, half left in place, and the new row added to one or the other of the resulting pages. Afterward, they are often both partly empty.&lt;/p&gt;

&lt;p&gt;Second, the whole graphic in the GUID clustered index area is a dark blue that the visualizer uses to show fragmentation – in fact, the object is almost perfectly fragmented, with practically no contiguous pages at all. The sequence of pages in the leaf level of the index is still a linked list, as always, but it it is physically stored in essentially random order on disk.&lt;/p&gt;

&lt;h2&gt;Does Re-Indexing Help?&lt;/h2&gt;

&lt;p&gt;The next question is whether we can combat these problems by doing huge amounts of index maintenance – if we rewrite the GUID index, will that make it take less space, or make it more efficient? The answer is, “well, sort of, temporarily.”&lt;/p&gt;

&lt;p&gt;First, re-indexing will put the table in “GUID” order. Whether that really helps or not is debatable, perhaps. It would enable read-ahead for the index, which is otherwise clobbered by the fragmentation. Having the table in “GUID” order might or might not be of any help to performance. Second, re-indexing will make the pages denser, or less dense, depending on the fill factor applied. For the sake of demonstration, let’s re-index with the default fill factor, because I think that happens a lot out in the world, and it may tell us something:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;After re-indexing, this is a view just of the second file group with the GUID clustered table (note that I scrolled down in the display):&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_04_3266E2EB.png"&gt;&lt;img title="VizDemo2_04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_04" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_04_thumb_1B3B686D.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The arrow shows where the data was moved from the old data pages into a new region of the file. And, sure enough, it’s not fragmented (note the lighter color) and it takes less space in the file.&lt;/p&gt;

&lt;p&gt;That might sound good, but if this is a real database, inserts probably will continue. In the int clustered case, as we know, new data will be appended to the end of the page sequence, but in this case, new data will have to be inserted into most of the existing pages on disk. Those are all full now, and will have to be split 50/50 to create new pages for the new data, both the old and new pages will have to be written out, and the new pages by definition can’t be in index order with the existing pages.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;20000&lt;/pre&gt;


&lt;p&gt;What we get after more rows are added to the table is what a layperson might call a “hot mess:”&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_05_1D0BBE34.png"&gt;&lt;img title="VizDemo2_05" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_05" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_05_thumb_7A4A8676.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here everything is fragmented – back to that dark blue – even the pages we just re-indexed a moment ago, because they all split. The table has &lt;em&gt;more than&lt;/em&gt; doubled in size, even though we just doubled the number of rows, because the individual pages contain less data.&lt;/p&gt;

&lt;p&gt;Would appropriate fill factor be a workaround? In some measure, yes, but it really only combats the issue. The write activity on the table, even with a low fill factor, will still be higher as more existing pages have to be flushed at checkpoints. The pages will still be less dense, and therefore take up more space on disk and in cache. In short – maybe helpful but no silver bullet.&lt;/p&gt;

&lt;p&gt;What about Sequential GUIDs? Here I will venture my opinion. Sequential GUIDs have never made sense to me. They solve one part of this problem – the distributed insert part – but &lt;em&gt;at the expense of the very things GUIDs might be good for&lt;/em&gt;, namely not demanding a visit to the database to generate an identifier. If you have to come to the database, you already lost this whole argument. Use an integer and solve the rest of the problem at the same time. I can only see it as a sort of band-aid for existing systems that could not be refactored, but, like a bad SUV that combines the worst properties of a car and a truck, it feels like a really poor compromise to me.&lt;/p&gt;

&lt;p&gt;I hope this helps to illustrate some of the physical database design challenges that surround the use of GUID cluster keys. In the next installment I’m planning to demonstrate the interleaving of objects, which is one argument for multiple file groups.&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout II</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2013/01/23/visualizing-data-file-layout-ii.aspx</link><pubDate>Wed, 23 Jan 2013 23:40:02 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47268</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Part 2 of a blog series visually demonstrating the layout of objects on data pages in SQL Server&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx"&gt;Part 1&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In Part 1 of this series, I introduced a little demo app that renders the layout of pages in SQL Server files by object. Today I’ll put that app through its paces to show, in vivid color (well, teal, anyway) the destructive power of the famous&lt;strong&gt; Re-Index Then Shrink&lt;/strong&gt; anti-pattern for index maintenance.&lt;/p&gt;  &lt;p&gt;This one is very easy to demo, so let’s go!&lt;/p&gt;  &lt;p&gt;First, I created a demo database &lt;strong&gt;VizDemo1&lt;/strong&gt;, with a single 200 MB data file. Into that database I placed a canonical table – highly simplified for this example – clustered on an ever-increasing integer, using identity():&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE &lt;/span&gt;VizDemo1
&lt;span style="color:blue;"&gt;GO

CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;int identity&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;1&lt;span style="color:gray;"&gt;, &lt;/span&gt;1 &lt;span style="color:gray;"&gt;) NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;Then we populate that table with some dummy data:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;

&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;40000&lt;/pre&gt;


&lt;p&gt;And finally, fire up the little visualizer app and process the database:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_01_4683F23D.png"&gt;&lt;img title="VizDemo1_01" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_01" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_01_thumb_53EA0543.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The small color bands at the top left corner of the image are the system tables and such that are in every “empty” database to make it run. The blue/green/teal area is the new table we created and populated with sample data, and the gray area represents empty regions in the file.&lt;/p&gt;

&lt;p&gt;As expected, the table started writing into the first available space, and, because the cluster key is increasing, pages were allocated to the end of the page sequence in order, and we end up with a crazy-perfect, contiguous linked list on disk.&lt;/p&gt;

&lt;p&gt;You can see small darker bars at intervals within the table – most of the pages in the index are “type 1” pages, which are the leaf-level/rows in the clustered index. Those bars are “type 2” index pages that have the upper level(s) of the index. The reason they are darker is that those are a disruption in the leaf level linked list, and the app shades such disruptions as a way to see fragmentation. The list has to “hop over” those pages and then continue on the other side. It’s technically fragmentation, but at this point not harmful at all – but remember that darker color that shows a break in the page order.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A side note: in the midst of the gray area you can see one orange line (and another in the sea of teal). Those are “type 11” PFS pages, which happen on a fixed interval in every file. I don’t think they ever move – they track file allocation and free space metadata. They are like rocks in the stream…&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, what happens if we re-index this bad boy? Well, a re-index operation has to write all the pages for the object into new, blank pages in the file, and then abandon the old pages. I run:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:green;"&gt;-- This &amp;quot;moves&amp;quot; all the data toward the end of the file, into free areas
&lt;/span&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;Then re-analyze the file. As expected, the table has “moved” toward the end of the file, and left free space toward the beginning. It’s still not fragmented, because we had enough room, and it was written in order into that new area by the rebuild:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_02_61501849.png"&gt;&lt;img title="VizDemo1_02" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_02" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_02_thumb_3EFB1381.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see the gray area near the top is all the “abandoned” pages where the index was, and the data has all moved down into the free area. Ah, but that seems wasteful to some people, am I right? All that empty space – the file could be smaller!&lt;/p&gt;

&lt;p&gt;Let’s see the damage that Shrink File does. Imagine that I do this:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;DBCC &lt;/span&gt;SHRINKFILE &lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:red;"&gt;N'VizDemo1' &lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;70&lt;span style="color:gray;"&gt;)
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;First, before we shrink, let’s just scroll down and look at the end of the file:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_03_4E9DAF43.png"&gt;&lt;img title="VizDemo1_03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_03" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_03_thumb_10A41E85.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have two conditions – the gray part of the file is sort of OK to shrink. There’s just a lone PFS page out there, and removing that does no harm. But once we get into that blue area, the data has to be moved back up into the beginning of the file. Here’s where the problem lies, as I learned from Mr. Paul Randal – the shrink routine will move a page at a time back into that free space, starting from the end, going backward. That makes the pages land in approximately &lt;em&gt;reverse order&lt;/em&gt; from the correct index order. Perfect fragmentation. Let’s see if this tool proves him right. Shrink, then re-analyze:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04_595D9749.png"&gt;&lt;img title="VizDemo1_04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_04" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04_thumb_500462C6.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yep, it’s not immediately apparent, perhaps, but that teal color is a darker shade that indicates every page is a fragment boundary in most of the index – perfect fragmentation! Here’s a better view:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04a_51D4B88D.png"&gt;&lt;img title="VizDemo1_04a" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_04a" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04a_thumb_665A080B.png" width="408" height="284" /&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;So, how can we clean that up? Well, with a rebuild. But … we need that bigger file. In fact, practically any database in production needs this overhead of available space to be able to perform index maintenance. It’s not “wasted” space at all.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;After the rebuild, the index is back toward the end of the file, but it’s also back in order:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_05_13DB27CF.png"&gt;&lt;img title="VizDemo1_05" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_05" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_05_thumb_1C5E8719.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, in light of this information, imagine nightly re-indexing on a database with … &lt;strong&gt;AutoShrink!&lt;/strong&gt; &amp;lt;shudder&amp;gt;&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout I</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx</link><pubDate>Tue, 22 Jan 2013 22:50:34 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47250</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Part 1 of a blog series visually demonstrating the layout of objects on data pages in SQL Server&lt;/p&gt;  &lt;p&gt;Some years ago a gentleman called &lt;a href="http://sqlblogcasts.com/blogs/danny/default.aspx"&gt;Danny Gould&lt;/a&gt; created a free tool called &lt;a href="http://internalsviewer.codeplex.com/"&gt;Internals Viewer for SQL Server&lt;/a&gt;. I’m a visual sort of guy, and I always thought it would be fun and educational to make a simple visualizer, like the one he created, in order to view how objects are laid out in SQL Server files, and to use it to demonstrate how operations like re-index and shrink affect the layout of files.&lt;/p&gt;  &lt;p&gt;To that end, and a little bit reinventing the wheel truth be told, I spent this past holiday creating a simple .NET app that renders the file layout of a database into a color-coded bitmap:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer01_48CAE32A.png"&gt;&lt;img title="FileLayoutViewer01" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FileLayoutViewer01" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer01_thumb_586D7EEC.png" width="824" height="376" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Fig 1&lt;/p&gt;  &lt;p&gt;The app can scan the pages in a database, grab the header output from DBCC PAGE, parse that, and create a structure with a few key bits of information about every page. It then renders a bitmap from those structures showing a few things (Fig 1):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Each data object (index or table) partition is identified with a unique partition ID in SQL Server. Those IDs are used in this tool to color-code the output by object, from a color lookup table. Each color in the example screenshot represents the pages that are dedicated to a single partition of an object. This screenshot shows AdventureWorks, which doesn’t use the Enterprise Edition partitioning feature, so for this case each color represents one object – every object having exactly one partition in Standard Edition (or in databases that don’t use partitioning).&lt;/p&gt;    &lt;p&gt;Unallocated pages are shown as gray gaps. These are regions that are part of the physical file(s), but not used to store anything.&lt;/p&gt;    &lt;p&gt;The app flags pages at the end of any fragment of an object using a darker colored band, so it will reveal any non-contiguous structures in the data file(s). Sometimes these happen at the end of a region of the file where one object is stored, but, interestingly, sometimes these can happen in the middle – as shown in the image above where a dark band interrupts a continuous region of the same color.&lt;/p&gt;    &lt;p&gt;The app has some very basic mouse-over capability where you can run the mouse over the image and the text fields at right will reveal information about the pages, including the object schema.table.index and partition, and also whether the page represents a fragmentation boundary.&lt;/p&gt;    &lt;p&gt;Finally, the app shows what page types are located where in the file using the narrower white/gray/black bands. White represents data or index pages, while other shades of gray or black indicate other kinds of system pages, per Paul Randal’s excellent blog post &lt;a href="http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/"&gt;here&lt;/a&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;h2&gt;The Pixels Already Tell a Story&lt;/h2&gt;  &lt;p&gt;So, what can we learn about the sample database in this image? Here are a few things:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The part of the file shown in the bitmap is fairly dense. There aren’t big regions of unallocated space in the file. A gap in the allocated pages looks like this (enlarged):      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer02_6C869B75.png"&gt;&lt;img title="FileLayoutViewer02" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FileLayoutViewer02" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer02_thumb_1354B1B6.png" width="206" height="194" /&gt;&lt;/a&gt;       &lt;br /&gt;Empty Region       &lt;br /&gt;&amp;#160; &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Objects in the file are not contiguous, and may “hop around.” That is, if you follow the linked list of pages that compose an index, a bunch of them will be in a row, and then there will be a page that links to the next page composing the index but it’ll be in a different location in the file. I’ve called these “frag boundaries” – pages that do link to another page, but where that next logical page isn’t the next physical page in the file. In the graphic the frag boundary pages are colored with a darker dithered pattern. You can mouse over these and look in the text fields at the right in the app, and see the page they link to.      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer03_495BB6DC.png"&gt;&lt;img title="FileLayoutViewer03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewer03" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer03_thumb_7029CD1C.png" width="238" height="188" /&gt;&lt;/a&gt;       &lt;br /&gt;Fragment Boundaries       &lt;br /&gt;&amp;#160; &lt;br /&gt;      &lt;br /&gt;Sometimes the end of a fragment will be adjacent to pages from another object, but it can be the case that there’s a fragment boundary in the middle of the pages for one object – it’s just that the linked list goes up to that point in the file, but then the next page in the index (in index order) isn’t the next page in the file, even though the next page in the file is part of the same object. Imagine a page split in the “middle” of an index – the existing page with half the rows stays in place, and a new page with the other half of the rows gets created in the middle of the logical index order but possibly stored in some other location in the physical file.       &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Right at the very beginning of the file there’s a special sequence of metadata pages that describe the database, allocations, and so on (again, well documented by Paul Randal). In our diagram this shows up as a series of pages at top left with varying page type (the gray and white) indicators:      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer04_020660EA.png"&gt;&lt;img title="FileLayoutViewer04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewer04" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer04_thumb_28D4772A.png" width="268" height="206" /&gt;&lt;/a&gt;       &lt;br /&gt;Database and file metadata pages &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In the next installment, I’ll run some test databases through this and we can see what more severe fragmentation looks like, the effect of GUID cluster keys, shrink, and how the data moves around in a re-index operation.&lt;/p&gt;  &lt;p&gt;Here’s a short demo video of the mouse-over working (quality is You-Tube limited):&lt;/p&gt;  &lt;div id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:a5c43597-8d84-4a1a-b897-db2f19356e72" class="wlWriterEditableSmartContent" style="float:none;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;"&gt;&lt;div&gt;&lt;a href="http://www.youtube.com/watch?v=q47a0L0_9Ws" target="_new"&gt;&lt;img src="http://sqlblog.com/blogs/merrill_aldrich/videof83e47a6713d_7D8FE022.jpg" style="border-style:none;" alt=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="width:448px;clear:both;font-size:.8em;"&gt;Animated Screen Cap of Mouse-over&lt;/div&gt;&lt;/div&gt;</description></item><item><title>Update to Rules-Driven Maintenance</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2013/01/16/update-to-rules-driven-maintenance.aspx</link><pubDate>Wed, 16 Jan 2013 18:34:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47183</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Back in August I &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;posted&lt;/a&gt; a first version of a rules-driven solution for backups, index and statistics maintenance and integrity checks. The system in general has been working well, and has saved my team a huge amount of time and effort. We are coming to the anniversary of its use in production soon.&lt;/p&gt;  &lt;p&gt;Today I offer an update that contains a few enhancements, performance improvements and a bug fix.&lt;/p&gt;  &lt;p&gt;To recap, this is a system or framework to manage many small databases across many instances on many servers in a policy-based/automated way. The details about how to deploy and how to use the system are in the &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;original post&lt;/a&gt;, but at a high level:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;First deploy the code across a collection of servers using the included PowerShell script &lt;b&gt;DeployMaintenance.ps1&lt;/b&gt;. That will create an administrative database on every instance (or use one that you provide) and populate it with the required tables, procedures, etc. The deployment script will also make a handful of SQL Agent jobs that handle all the maintenance on all the databases.&lt;/li&gt;    &lt;li&gt;Visit the instances. If there is maintenance already in place, disable or remove it so as not to perform this work twice. If necessary, change or override the default preferences for this solution in the admin database for each type of maintenance. This includes the time of maintenance windows, whether to use features like differential backups or Litespeed, whether there are databases that should be treated as exceptions, with different settings, etc.&lt;/li&gt;    &lt;li&gt;Enable the new policy jobs in SQL Agent. &lt;/li&gt;    &lt;li&gt;Monitor to make sure things are running smoothly.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The theory behind this project is to save work configuring servers (efficiency), to ensure that the maintenance is in fact deployed and working for ALL databases in the environment (quality), to ensure that it works in a truly consistent way everywhere (consistency), and to make it so that the servers don’t have to be touched by a DBA when changes happen, such as adding or dropping databases (resiliency).&lt;/p&gt;  &lt;h4&gt;Changes&lt;/h4&gt;  &lt;p&gt;The updates in this 1.1 version include&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Better reporting of errors. The jobs now write log files to the default error log location for SQL Server.&lt;/li&gt;    &lt;li&gt;Better recovery from errors. The jobs in some cases continue on error – for example, one failed backup will not stop the backup job and leave other databases without backups.&lt;/li&gt;    &lt;li&gt;Performance enhancements for examining backup history and system counters (mainly % log used).&lt;/li&gt;    &lt;li&gt;One bug fix for the log backup logic on a server that uses both log shipping and Litespeed.&lt;/li&gt; &lt;/ul&gt;  &lt;h4&gt;Disclaimers&lt;/h4&gt;  &lt;p&gt;The code for the system is posted here, and you are welcome to download it and try it out. Please note that this is not a commercial solution, and that while you may test or deploy this, you do so at your sole risk. Edits to the code almost certainly will be required for your environment. Please read, understand and test the code thoroughly before you even &lt;i&gt;imagine&lt;/i&gt; using this in production. I do not want to cause you or your employer any pain. License for using this solution is GPL 3.0: &lt;a href="http://opensource.org/licenses/GPL-3.0"&gt;http://opensource.org/licenses/GPL-3.0&lt;/a&gt;. You may use and modify this code, but not sell it.&lt;/p&gt;  &lt;p&gt;This has been run on SQL Server versions from 2005 to 2008 R2. It probably works on 2012, but I have not tested it thoroughly.&lt;/p&gt;  &lt;p&gt;I run only case-insensitive collation; if you are on a case-sensitive system, then you have my sympathy, and you might have to edit a few things to make this work.&lt;/p&gt;  &lt;p&gt;There is coverage for Quest Litespeed backups in the solution; other third-party backup products could be incorporated easily, but have not been. It should be possible to follow the pattern used to handle Litespeed and extend it to any other third-party stored procedures.&lt;/p&gt;  &lt;p&gt;If you haven’t done index maintenance for a while, and you flip this solution on &lt;b&gt;you might blow up your transaction log&lt;/b&gt;. Be sure that you have plenty of log space and plenty of log backup space, and watch the process the first few runs. This is especially true if the combination of &lt;b&gt;Standard Edition of SQL Server&lt;/b&gt; and the preference for &lt;b&gt;Online&lt;/b&gt; index maintenance applies in your environment, because the maintenance process will perform index maintenance using reorganize (defrag), which produces a lot of log.&lt;/p&gt;  &lt;p&gt;Lastly, this is &lt;b&gt;a solution for the type of environment that has many small databases&lt;/b&gt;, not the type with a few big, performance intensive databases. Databases that are large and require detailed attention may not be suitable for a one-size-fits-all solution like this.&lt;/p&gt;</description></item><item><title>Geek City: Build a Big Table with a Columnstore Index</title><link>http://www2.sqlblog.com/blogs/kalen_delaney/archive/2012/04/26/Build-a-Big-Table-with-a-Columnstore-Index.aspx</link><pubDate>Thu, 26 Apr 2012 18:18:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:43013</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;p&gt;I was looking all over to find a big table to use for my columnstore 
examples in my SQL Server 2012 book, and the only one I found was from someone 
at Microsoft and was not publicly available.  When I demonstrate code in my 
writings, I like to have the code available for anyone to use, so that was no 
use. &lt;/p&gt;
&lt;p&gt;Finally I realized I was just going to have to do it myself. I actually based 
the script on some work by the awesome Benjamin Nevarez (&lt;a href="http://www.benjaminnevarez.com/"&gt;blog&lt;/a&gt;| &lt;a href="http://twitter.com/BenjaminNevarez"&gt;twitter&lt;/a&gt;), but I needed to make a 
couple of changes to the table structure, and then the loop for adding the data 
took some tweeking to make sure that uniqueness was possible, in case you need 
to test out how the metadata changes when a columnstore index is built on a 
table with a unique clustered index compared to a nonunique clustered index 
compared to a heap. &lt;/p&gt;
&lt;p&gt;I have just finished the chapter on indexes, and decided to make this script 
available. The initial table data is based on the data in Microsoft 
AdventureWorksDW2012 sample database that you can download &lt;a href="http://msftdbprodsamples.codeplex.com/releases/view/55330"&gt;here&lt;/a&gt;.  
(Note that the DW version is the first one under “Other Available Downloads”’; 
it’s not the one under “Recommended Download”. )&lt;/p&gt;
&lt;p&gt;Here is the section of the script that populates most of the 
&lt;em&gt;FactInternetSalesBIG&lt;/em&gt; table. (The original 60K rows were just copied 
from the original &lt;em&gt;FactInternetSales&lt;/em&gt; table. Note that all the statements 
have to be run in a single batch because the local variable’s scope is the 
batch, and because the GO 9 applies to the single batch that precedes it. Of 
course, you can change the number of iterations to end up with a different size 
table. Mine ends up at just over 30 million rows.  In addition to modifying the 
&lt;em&gt;SalesOrderNumber&lt;/em&gt; value on each iteration, I also changed the value of the 
Revision column to indicate which pass through the insert loop was being 
executed. &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;br&gt;&lt;font size="2" face="Courier New"&gt;&lt;br&gt;-- Copy the new big table into itself 
9 times&lt;br&gt;DECLARE @RevisionNumber nchar(2);&lt;br&gt;SELECT @RevisionNumber = 
RevisionNumber + 1 FROM RevisionNumberValue;&lt;br&gt;SELECT @RevisionNumber as 
RevisionNumber;&lt;br&gt;INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)&lt;br&gt;    
SELECT ProductKey&lt;br&gt;      ,OrderDateKey&lt;br&gt;      ,DueDateKey&lt;br&gt;      
,ShipDateKey&lt;br&gt;      ,CustomerKey&lt;br&gt;      ,PromotionKey&lt;br&gt;      
,CurrencyKey&lt;br&gt;      ,SalesTerritoryKey&lt;br&gt;      ,SalesOrderNumber + 
@RevisionNumber&lt;br&gt;      ,SalesOrderLineNumber&lt;br&gt;      
,@RevisionNumber&lt;br&gt;      ,OrderQuantity&lt;br&gt;      ,UnitPrice&lt;br&gt;      
,ExtendedAmount&lt;br&gt;      ,UnitPriceDiscountPct&lt;br&gt;      ,DiscountAmount&lt;br&gt;      
,ProductStandardCost&lt;br&gt;      ,TotalProductCost&lt;br&gt;      ,SalesAmount&lt;br&gt;      
,TaxAmt&lt;br&gt;      ,Freight&lt;br&gt;      ,CarrierTrackingNumber&lt;br&gt;      
,CustomerPONumber &lt;br&gt;      ,OrderDate&lt;br&gt;      ,DueDate&lt;br&gt;      ,ShipDate&lt;br&gt;  
FROM dbo.FactInternetSalesBig;&lt;br&gt;  UPDATE RevisionNumberValue SET 
RevisionNumber = RevisionNumber + 1;&lt;br&gt;GO 9&lt;/font&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Make sure you have enough log space. My log grew from 1GB to just over 8GB 
when running the script to build the 30 million row table and columnstore index, 
with no clustered index.  &lt;/p&gt;
&lt;p&gt;The downloadable script has commented options to build either a clustered or  
nonclustered index before you build the columnstore index. I suggest building 
the clustered index before the columnstore index,  because building the 
clustered index&amp;nbsp;will rebuild any existing nonclustered indexes, including my 
columnstore index. Building the columnstore index can take a while, on my system 
it was about 10 minutes. (Building the original 30+ million row table took even longer.)&lt;/p&gt;&lt;p&gt;After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!&lt;/p&gt;
&lt;p&gt;Have fun!&lt;/p&gt;
&lt;p&gt;&lt;font color="#d16349" size="4"&gt;~Kalen&lt;/font&gt;&lt;/p&gt;</description></item><item><title>Speaking at the Charlotte Users Group</title><link>http://www2.sqlblog.com/blogs/andrew_kelly/archive/2011/08/26/speaking-at-the-charlotte-users-group.aspx</link><pubDate>Sat, 27 Aug 2011 00:33:55 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:38084</guid><dc:creator>Andrew Kelly</dc:creator><description>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;I will be giving a presentation in between Earthquakes and Hurricanes in Charlotte NC. this Wednesday the 31st of August on &lt;font color="#0000ff"&gt;Understanding Indexes in SQL Server 2008&lt;/font&gt;. If you are in the area please drop by and say hello. You can find out more information and register for the event &lt;a href="http://august2011cssug.eventbrite.com/" target="_blank"&gt;here&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;Andy&lt;/p&gt;</description></item><item><title>SQL Server, Seeks, and Binary Search</title><link>http://www2.sqlblog.com/blogs/paul_white/archive/2011/08/08/sql-server-seeks-and-binary-search.aspx</link><pubDate>Mon, 08 Aug 2011 20:10:27 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:37622</guid><dc:creator>Paul White</dc:creator><description>&lt;p&gt;The following table summarizes the results from my last &lt;a href="http://sqlblog.com/blogs/paul_white/archive/2011/08/04/avoiding-uniqueness-for-performance.aspx"&gt;two&lt;/a&gt; blog entries, showing the CPU time used when performing 5 million clustered index seeks:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/image_2AB2C931.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Clustered Index Seek Test Results" border="0" alt="Clustered Index Seek Test Results" src="http://sqlblog.com/blogs/paul_white/image_thumb_41FDCDA2.png" width="315" height="64" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In test 1, making the clustered index unique improved performance by around 40%. In test 2, making the same change reduced performance by around 70% (on 64-bit systems – more on that later).&amp;#160; As a reminder, both tests use nested loops to join a single-column BIGINT table of numbers to itself:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/image_15749DBC.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Query and Plans" border="0" alt="Query and Plans" src="http://sqlblog.com/blogs/paul_white/image_thumb_52491F8E.png" width="647" height="453" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In test 1, the table contains numbers from 1 to 5 million inclusive; for test 2, the table contains the even numbers from 2 to 10 million (still 5 million rows).&amp;#160; In case you were wondering, the test 2 results are the same if we populate the table with odd numbers from 1 to 9,999,999 instead of even numbers.&amp;#160; How can we explain these results?&lt;/p&gt;  &lt;h3&gt;Performing an Index Seek&lt;/h3&gt;  &lt;p&gt;Perhaps we need to look a bit deeper into how SQL Server performs an index seek.&amp;#160; Most people know that SQL Server indexes are a &lt;a href="http://en.wikipedia.org/wiki/B%2B_tree"&gt;B+ tree&lt;/a&gt;, so let’s work through an example of locating the row containing the value 1 million, when the table contains values from 1 to 5 million, with a unique clustered index.&lt;/p&gt;  &lt;p&gt;Index seeks start at the index root page, which we can find in a number of ways (e.g. &lt;a href="http://msdn.microsoft.com/en-us/library/ms189051.aspx"&gt;sys.system_internals_allocation_units&lt;/a&gt;).&amp;#160; I happened to use &lt;a href="http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/12/13/more-undocumented-fun_3a00_-dbcc-ind_2c00_-dbcc-page_2c00_-and-off_2d00_row-columns.aspx"&gt;DBCC IND&lt;/a&gt; to determine that the root of the index was page &lt;strong&gt;117482&lt;/strong&gt; in file 1 (in Denali, we can use a new DMF: &lt;a href="http://sqlgeek.pl/2011/07/26/pl-denali-ctp3-nowe-lepsze-dbcc-ind/"&gt;sys.dm_db_database_page_allocations&lt;/a&gt;).&amp;#160; Anyway, whichever way you find it, we can use &lt;a href="http://support.microsoft.com/kb/83065"&gt;DBCC PAGE&lt;/a&gt; to look at the contents of the root page:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/SNAGHTMLd8089b_0F1DA161.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Index Root Page Contents" border="0" alt="Index Root Page Contents" src="http://sqlblog.com/blogs/paul_white/SNAGHTMLd8089b_thumb_36946DCB.png" width="489" height="213" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The col1 (key) column shows the lowest key value that can possibly be present in the lower-level page specified by Child Page Id.&amp;#160; We can see that the lowest possible key value on page 117485 is 906305, and the next lowest key value possible is 1132881 on page 117486.&amp;#160; So, if a row containing the value 1 million exists, the next page to look at is &lt;strong&gt;117485&lt;/strong&gt;:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/SNAGHTMLdb085d_41DD8208.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Level 1 Index Contents" border="0" alt="Level 1 Index Contents" src="http://sqlblog.com/blogs/paul_white/SNAGHTMLdb085d_thumb_383513D2.png" width="486" height="232" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;We are now at level 1 of the index, and have to scroll down the output a bit further to see that the next page to look at is &lt;strong&gt;119580&lt;/strong&gt;.&amp;#160; This page is at the leaf level of the index, and we find our target value at position 399:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/image_07525C67.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Leaf level index record" border="0" alt="Leaf level index record" src="http://sqlblog.com/blogs/paul_white/image_thumb_5AC92C80.png" width="429" height="72" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;h3&gt;Binary Search&lt;/h3&gt;  &lt;p&gt;When performing an index seek, SQL Server obviously doesn’t run DBCC PAGE and scroll down the output window to find the next page to look at like we just did.&amp;#160; You may know that SQL Server can use binary search to locate index entries, but it’s worth taking a closer look at the index pages to see how this works in practice:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/image_6B61276E.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="SQL Server Page Structure Diagram" border="0" alt="SQL Server Page Structure Diagram" src="http://sqlblog.com/blogs/paul_white/image_thumb_61B8B938.png" width="486" height="478" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The slot array contains a series of two-byte offsets pointing to the index records.&amp;#160; The index records are not necessarily stored in any particular order on the page; only the slot array entries are guaranteed to be in sorted order.&amp;#160; Note that the sorted slot array entries do not contain the index keys themselves, just offsets.&lt;/p&gt;  &lt;h4&gt;Performance&lt;/h4&gt;  &lt;p&gt;When SQL Server uses binary search to locate an index entry, it starts by checking the middle element of the slot array (shown in red in the example above).&amp;#160; It then follows that offset pointer to the index record, and compares the value it finds there with the sought value.&amp;#160; Assuming the value is not the one we are looking for, the sorted nature of the slot array allows SQL Server to narrow the search range in half after this first comparison.&lt;/p&gt;  &lt;p&gt;If the red index record happens to contain a value &lt;em&gt;higher&lt;/em&gt; than the sought value, we know the value we are looking for is in that half of the slot array that sorts lower than the red entry.&amp;#160; This process of cutting the search space in half continues until we find the record we are looking for, or until it becomes clear that the value does not exist.&lt;/p&gt;  &lt;p&gt;The average and worst case performance of binary search is very close to log&lt;sub&gt;2&lt;/sub&gt;N comparisons to find the item of interest in a sorted list of N items.&amp;#160; For the 476 index entries found in our test example index structure at the leaf level, that means at most 9 comparison operations.&amp;#160; That compares well to a linear search of the sorted slot array, which might require as many as 476 comparisons if we are unlucky, 238 on average.&lt;/p&gt;  &lt;p&gt;None of this offers any particular insight into why a binary search into a unique indexes containing only even (or odd!) numbers should be so slow.&amp;#160; One of the advantages of binary search is that it expects to perform around the same amount of work, regardless of the distribution of the values in the index.&amp;#160; Luckily, binary search is not the only game in town.&lt;/p&gt;  &lt;h3&gt;Linear Interpolation&lt;/h3&gt;  &lt;p&gt;As efficient as binary search is in the general case, in some cases we can do better.&lt;/p&gt;  &lt;p&gt;When searching for the value 1 million, we know from level 1 of the index that the lowest key value that can appear on the leaf level page we are interested in is 999,601.&amp;#160; We also know that the lowest key value on the next page is 1,000,077.&amp;#160; From the header information on our target page, we also know there are 476 entries in the slot array on that page.&amp;#160; Given these facts, we can make an immediate guess at which slot the value 1,000,000 ought to appear in:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;expected slot = (1000077 - 999601) / 476 * (1000000 - 999601) = 399&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This simple calculation takes us immediately to slot 399.&amp;#160; As we saw earlier, slot 399 does indeed contain the value 1,000,000:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/image_5EC35485.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Slot 399" border="0" alt="Slot 399" src="http://sqlblog.com/blogs/paul_white/image_thumb_323A249F.png" width="429" height="72" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Naturally, the quality of the linear interpolation guess depends on how evenly distributed the values are within the page.&amp;#160; To a lesser extent, it also depends on how closely the minimum key value information at level 1 of the index matches reality.&amp;#160; In both tests shown in earlier blog entries, the values are precisely evenly distributed and the level 1 index key information is spot on.&lt;/p&gt;  &lt;h4&gt;Performance&lt;/h4&gt;  &lt;p&gt;Linear interpolation has the potential to be more efficient even than binary search, finding the target value in one comparison in this example, compared to nine comparisons for binary search.&amp;#160; Even where the data is not perfectly distributed, there is scope for linear interpolation to be superior, simply by applying the technique repeatedly on successively narrower ranges.&lt;/p&gt;  &lt;p&gt;A reasonable practical compromise might be to try linear interpolation two or three times, before falling back to linear or binary search on the remaining range.&amp;#160; The advantage of linear interpolation may not seem much, but consider that SQL Server makes these comparisons for every index seek operation – when doing many millions of seeks, the difference can soon add up.&amp;#160; It is probably not coincidence that the OLTP TPC benchmark tests perform a very great number of singleton index seeks.&lt;/p&gt;  &lt;h3&gt;Linear Regression&lt;/h3&gt;  &lt;p&gt;A natural extension to the idea of linear interpolation is to apply a linear regression (line of best fit) instead:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/image_42D21F8D.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Linear Regression Diagram" border="0" alt="Linear Regression Diagram" src="http://sqlblog.com/blogs/paul_white/image_thumb_6717040F.png" width="646" height="461" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In the diagram, the blue dotted line represents a linear interpolation based on the values of the smallest and highest values only.&amp;#160; The black line is obtained by a linear regression of &lt;em&gt;all&lt;/em&gt; the data values, and results in a much better fit.&lt;/p&gt;  &lt;p&gt;Since every straight line can be represented by a formula of the form y = mx + b, we can completely specify it by recording the values of the ‘m’ (slope) and ‘b’ (y-axis intercept) parameters.&amp;#160; The R&lt;sup&gt;2&lt;/sup&gt; value gives an idea of how good a fit the linear regression line is to the data.&amp;#160; In the present context, the ‘y’ value represents the indexed value, and the ‘x’ value is the slot position where that value can be found.&amp;#160; It’s easy to see that once we have a linear regression line, we can estimate the slot position for a particular sought index value by solving x = (y – b) / m.&lt;/p&gt;  &lt;p&gt;To use linear regression in a database, we might imagine storing the ‘b’ and ‘m’ parameter values somewhere (perhaps in a compact format in the page header), and deciding whether to apply the technique based on the R&lt;sup&gt;2&lt;/sup&gt; value.&lt;/p&gt;  &lt;h3&gt;What Does SQL Server Do?&lt;/h3&gt;  &lt;p&gt;All this is fine in theory, but does SQL Server &lt;em&gt;really&lt;/em&gt; ever use anything except binary search?&amp;#160; To see, I ran tests with a debugger attached to the SQL Server process, and stepped through the calls made while the seek test queries were running.&amp;#160; The call stack below was obtained when running 64-bit SQL Server:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/SNAGHTML385906e_329631C7.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="64-bit Call Stack" border="0" alt="64-bit Call Stack" src="http://sqlblog.com/blogs/paul_white/SNAGHTML385906e_thumb_28EDC391.png" width="450" height="225" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;By contrast, the following call stack comes from 32-bit SQL Server:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/paul_white/SNAGHTML38c1c77_25F85EDE.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="32-bit Call Stack" border="0" alt="32-bit Call Stack" src="http://sqlblog.com/blogs/paul_white/SNAGHTML38c1c77_thumb_1C4FF0A8.png" width="459" height="218" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;Interestingly, SQL Server follows the same code paths (at least at the call stack level, which is the best we can do without source code) regardless of whether the index is defined as unique or not, and for all the numeric data types I tested (INT, BIGINT, REAL, FLOAT, DECIMAL with a zero scale).&lt;/p&gt;  &lt;p&gt;The performance problem on x64 servers appears to be most pronounced when the key values have a small fixed gap between them.&amp;#160; As we have seen, performance is &lt;em&gt;very&lt;/em&gt; much worse on x64 servers compared with x86 versions when the table contains just even or odd numbers (i.e. with a gap of 1 between index entries).&amp;#160; If we change the test to multiply the original entries by a factor of ten (to produce a sequence of 10, 20, 30, 40…) the performance penalty all but disappears, and the x64 unique index test is around 30% faster than the non-unique test.&lt;/p&gt;  &lt;h3&gt;Conclusions and Further Reading&lt;/h3&gt;  &lt;p&gt;From the evidence available, my best guess is that 64-bit SQL Server does use linear interpolation and/or linear regression instead of binary search under suitable conditions, but that the implementation has edge-case poor performance where the index keys are separated by a small fixed offset.&amp;#160; It seems that we can achieve best performance by using a unique index where possible, and ensuring that keys are, as far as practicable, sequential and contiguous.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://academic.research.microsoft.com/Publication/2202072"&gt;B-tree Indexes, Interpolation Search, and Skew&lt;/a&gt; – Microsoft Research     &lt;br /&gt;&lt;a href="http://im.ufba.br/pub/MAT152/SemestreLetivo20052/intepolationSearch.pdf"&gt;Interpolation Search – a log log N Search&lt;/a&gt; (PDF)&lt;/p&gt;  &lt;p&gt;© 2011 Paul White&lt;/p&gt;  &lt;p&gt;email: &lt;a href="mailto:SQLkiwi@gmail.com"&gt;SQLkiwi@gmail.com&lt;/a&gt;     &lt;br /&gt;twitter: &lt;a href="http://twitter.com/SQL_Kiwi"&gt;@SQL_Kiwi&lt;/a&gt;&lt;/p&gt;</description></item></channel></rss>