In my last post, I told you I'd post something technical before I left for Sweden. I almost made it. I wrote this in the Seattle airport, but wasn't able to post it until I got to Stockholm.
For as long as I've been working with SQL Server, I've been hearing/reading questions about how to quickly get a count of the number of rows in a table. We've always had the stored procedure sp_spaceused which would give us a count, but we've always be warned that the count was not guaranteed to be completely accurate. Way back in Sybase versions 3 and 4, that was an enormous understatement. Working in Tech Support, we recommended that people not rely on this value at all, and internally, we referred to the procedure as sp_space_useless.
Things have improved over the years, and in many cases, the sp_spaceused procedure, or a direct SELECT from sysindexes could give a reasonably accurate count, but still, there has never been a guarantee. Starting in SQL Server 2005, sysindexes became only a compatibility view, and the recommended system object was the catalog view sys.partitions. Sys.partitions keeps track of rows per partition, so the following query would return multiple rows for a partitioned table:
SELECT rows FROM sys.partitions
WHERE object_id = object_id('name of table')
AND index_id < 2;
If you want one value, rather than one per partition, you could simply ask for the SUM:
SELECT SUM(rows) FROM sys.partitions
WHERE object_id = object_id('name of table')
AND index_id < 2;
But even in SQL Server 2008, it is not guaranteed that you can get an exact count of rows without actually counting. The only guaranteed way to get an accurate value is to SELECT from the table:
SELECT count(*) FROM [name of table];
But what does it mean to get an accurate count?
And when would the count not be accurate when selecting from sys.partitions? Microsoft doesn't reveal all the cases in which you might not get an accurate value, all they will admit is that the count is not guaranteed. You can actually reproduce one case where you might not want to trust the count, by examining the count in one connection while another connection is changing the number of rows.
Start by creating a copy of a table in the AdventureWorks2008 database:
USE AdventureWorks2008;
GO
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Header')
DROP TABLE Header
GO
SELECT * INTO Header
FROM Sales.SalesOrderHeader;
GO
You should have 31,465 rows in the new table.
Next, get ready to start a transaction to add more rows to the table:
-- Insert more rows:
USE AdventureWorks2008;
GO
BEGIN TRAN
INSERT INTO Header
SELECT [RevisionNumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
,[Status]
,[OnlineOrderFlag]
,[SalesOrderNumber]
,[PurchaseOrderNumber]
,[AccountNumber]
,[CustomerID]
,[SalesPersonID]
,[TerritoryID]
,[BillToAddressID]
,[ShipToAddressID]
,[ShipMethodID]
,[CreditCardID]
,[CreditCardApprovalCode]
,[CurrencyRateID]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[Comment]
,[rowguid]
,[ModifiedDate]
FROM [Sales].[SalesOrderHeader];
GO 10
ROLLBACK TRAN;
GO
In another window, you can run a query to inspect the count:
USE AdventureWorks2008;
GO
SELECT rows FROM sys.partitions
WHERE object_id = object_id('Header');
So now start the INSERT, and while it is executing, run the SELECT repeatedly. You should see the count increasing as more rows are added, and then you can see it DECREASING as the ROLLBACK happens, and the values returned are not always an even multiple of 31,465.
So which of these counts are accurate?
As we've been told, to get a true count, you'll need to use COUNT(*):
SELECT count(*) FROM Header;
However, this query will BLOCK when trying to get a value before the transaction is completely rolled back, at least if you're in the default READ COMMITTED isolation level. Selecting from sys.partitions will NOT block.
If your transactions are short, you won't get long term blocking, but if you have lots and lots of transactions inserting new rows continuously, what would an 'accurate' count really mean? The value returned one second would be completely different from the value returned the next second.
So you need to decide what 'accurate' really means, and maybe you'll decide that sys.partitions is accurate enough.
Inaccurately yours,
~Kalen