I got a couple of really great questions during my SSWUG Workshop this morning, as I was discussing seeks and scans, and since the answers to the two questions are very related, I decided to address both of them in more detail in this blog post.
Most people think about a seek operation as retrieving just one or a few rows, and a scan as accessing the entire table or index leaf level. And that usually is what I mean when I am just describing SQL Server behavior. But there is also the possibility that SQL Server could perform a 'partial scan', starting at some row in the table or index leaf, and then scanning only to a particular ending row. However, there is no operator in a query plan called 'partial scan'. So how can you tell when you have one?
Before I specifically answer that, let me tell you about the first question in the workshop this morning. I had been showing the Properties window (available from View/Properties Window), pointing out that when you click on one of the icons in a graphical query plan, the Properties window shows the same information that is in the yellow box that pops up, plus additional information. One of the values shown in one called 'Ordered', and when 'Ordered' has the value true, then there is also a value called 'Scan Direction', which might make you think that ordering is only relevant to scans. But you will see True values for 'Ordered' even when the operation is a seek. This is usually an indication that a partial scan is being performed.
And because I was talking about heaps being unordered sets of rows, one of the delegates asked: "Are index seeks against a heap always unordered?"
The answer is, of course, "it depends". If you have a covered query, the index access could be an ordered seek of the covering index, even though the underlying table is a heap. Here is an example from the AdventureWorks database. I make a copy of the Sales.SalesOrderHeader table, so it is a heap, and then build a nonclustered index on the Freight column.
IF EXISTS (SELECT * FROM dbo.SalesOrderHeader)
DROP TABLE dbo.SalesOrderHeader;
SELECT * INTO dbo.SalesOrderHeader
CREATE INDEX ix_Freight ON dbo.SalesOrderHeader(Freight);
Then I look at the plan for a covered query using the ix_Freight index:
WHERE Freight < 100;
And here is the plan showing the index seek:
And the Properties sheet for the Index Seek shows that Ordered - True and Scan Direction = FORWARD.
This is an example of an ordered seek on a nonclustered index, or what we can think of as a partial scan, or range scan.
The other question during the workshop was how to tell if the seek was really a range scan. Part of the answer is to look for the Ordered = True value, as I just illustrated. Another thing to notice in the graphical query plan is the thickness of the line moving rows to the left from the Index Seek, indicating that LOTS of rows are being returned from the Seek. That's another clue that we have a partial scan.
We can also get partial scans with clustered indexes. I'll build a clustered index on my new table.
CREATE CLUSTERED INDEX clus_OrderID ON dbo.SalesOrderHeader(SalesOrderID);
There are 31465 rows in the dbo.SalesOrderHeader table, with SalesOrderID values ranging from 43659 to 75123. I'll SELECT a small range of data:
SELECT * FROM dbo.SalesOrderHeader
WHERE SalesOrderID < 44000
The plan shows a Clustered Index Seek, and the Properties shows Ordered = True.
If I make the range bigger, and look for SalesOrderID < 60000, I still get an Index Seek, and the same when using an upper limit of 75000. If I run a query that returns ALL the rows, I still get an Index Seek.
Both queries are using an Index Seek, but notice the difference in the thickness of the arrows.
It turns out that the only time the optimizer will tell us it is doing a scan is when it knows absolutely that it will be looking at every row in the table or index leaf. And the only way it can know this is if there is no filter condition (NO WHERE clause) or no index on the filter condition. With the index on SalesOrderID, there is no way the optimizer can be absolutely sure that it will have to scan the whole table. Someone might have inserted a value of 100001 since the last time statistics were updated, and we wouldn't want to perform a scan and return a value that is not in the specified range. So the Index Seek in the plan really just means that SQL Server will access the data using the index, but not with a complete scan. There is something in the query that give a starting point, a stopping point or both.
I've got another Workshop for SSWUG on Index Internals, coming up on February 25th. Hopefully, I'll get a lot more great questions during that one!