For want of a nail the shoe was lost.
For want of a shoe the horse was lost.
For want of a horse the rider was lost.
For want of a rider the battle was lost.
For want of a battle the kingdom was lost.
And all for the want of a horseshoe nail.
I'm now in the middle of severity A case that is best depicted by this rhyme. While Microsoft engineers look for a root cause, I'll describe here how small bug can "kill" strong server. The story begins when for some unknown reason "rows" column in sys.partitions DMV begins to show 4.6 billion rows for clustered index of one of the entities tables in my datawarehouse database. COUNT(*) on the table returns 1 million rows.
Second step - daily job that executes sp_updatestats stored procedure. This procedure updates statistics in the database using sample data . Let's say, I randomly took 5% of table's data and calculated number of unique values for Column1 - n values. So total unique values of Column1 in my table is 20*n. But how do I know, how many rows per unique value of Column1 are there in the table? Very simple: SQL Server just takes rows column from sys.partitions and divides by our 20*n number.
Third step - query: it joins between our Table and another one WHERE Column1 = x. Optimizer compiles the query and builds execution plan based on existing statistics (e.g. 4 billion rows in the table). Expected number of rows from our table is millions - so Optimizer chooses HASH JOIN as the best way to perform JOIN operation.
Now that Optimizer made a decision about execution plan, it asks for a memory quota for query execution. Based on the estimation about millions of rows. Remember that memory quota for query execution isn't taken from data buffer cache, so on 32 bit systems we can't utilize too much memory. For example, I executed single query as described earlier and while it was running checked memory usage of my session.
So we have 252Mb granted while less than 1Mb used. Total available memory is 1.6Gb, so you can calculate yourself how many concurrent queries system supports at its current state. It is exactly the situation we had: 5 sessions were running and 65 were waiting for memory grant (if you monitor wait types - there will be a lot of RESOURCE_SEMAPHORE).
P.S. DBCC UPDATEUSAGE fixes the initial problem. So while Microsoft engineers look for the reason why sys.partitions went wild, I monitor it in order to fix the problem before it "kills" the server.