THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Sarah Henwood

Connection Failures, SQL 2005 Appears Unresponsive

Here is a situation one of my client's recently encountered on SQL Server 2005.  It was incredibly painful and a challenge to determine what was happening, plus it's an issue that is supposed to be fixed in SP2.  So wanted to share the details of the symptoms in case you experience it too.

 My client is on a SP2+ build.  They recently conducted a successfull migration from SQL Server 2000 to 2005.  They have a very high heavy rate of OLTP activity.  Things were chugging along just fine for a couple of hours after migration, and then all of a sudden web servers started throwing connectivity errors.  A local connection to the server could not even be made.  After establishing a connection using the DAC, first thing checked was the state of the SPID's on SQL - I fully expected to see excessive waits due to blocking or some other type of wait.  However, in this case it was strange - everything looked as I would expect for a normal functioning system - a small handful of SPID's were waiting on something with short waits you would expect to see at any period in time, but nothing excessive or abnormal looking (no high cpu, no indication of any runaway or expensive queries etc.)  The majority of SPID had a zero waittype and none were waiting on working thread.  This is a very key symptom of the issue we encountered - that SPID do not look problematic and you do not have a blocking or other wait type of bottleneck.  There also are no network connectivity issues or anything else that is typical of this type of symptom. 

We then issued a DBCC FREESYSTEMCACHE ('TokenAndPermUserStore')  and then immediately, the issue cleared and connections could be made to SQL and processing continue.  (Some notice resolution with DBCC FREEPROCCACHE - it is because freeing the procedure cache also dumps this store.  For this issue, you only need to dump the tokenandpermuserstore.)  I was very surprised since my client was already on SP2.  So it appears there still is a different flavor of this issue out there with some varying symptoms.  (RE:  http://support.microsoft.com/kb/927396)

 My client has a support case open with Microsoft and is still being investigated and debugged.  In the meantime, the workaround was to monitor the size of the TokenAndPermUserStore and free it when it got over 40MB.  Initially we started with a size of 100MB but still sporadically encountered the problem.  (I suspect it may be more related to number of entries, but size is what we were able to work around with.)  Below is a query you can use that will report the size and number of objects in the store if you want to track and trend.  You can add in additional logic if you need to free the cache if it is over a certain size or number of objects.

  SELECT SUM(single_pages_kb + multi_pages_kb) as CacheSize, entries_count,entries_in_use_count
  FROM sys.dm_os_memory_clerks
  WHERE name = 'TokenAndPermUserStore'

 Hope this helps!  And I will post with any new updates.

Published Monday, July 23, 2007 6:30 AM by sarahhen
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement