Memory configuration and investigating memory conditions is still something that seems to cause a lot of confusion. There are numerous articles, blog posts etc. that talk about SQL Server memory, virtual memory, AWE and other factors but not much about the one thing that I personally think give a great understanding of the different counters we have learned to use to look at when investigating SQL Server memory.
In this post I will talk a little bit about some mechanisms in the Buffer pool – i.e. the part of SQL Server memory where, above all, your data and index pages go and also at the performance monitor counters you can use when assessing the state of the Buffer pool.
(When I refer to pages, I am exclusively talking about data and index pages. For log records, there is a special Log Manager and I will talk about how log records are written in another post.)
First, let’s look at some Buffer pool concepts. A simplistic view of the Buffer pool looks something like this:
POOLS AND CACHE STORES
Pools are used to cache homogenous, stateless data. All entries in the pool are considered equal - for example a pool of connections or a pool of network buffers.
Cache stores are used to store stateful data and provide a set of memory allocation interfaces that can be reused by different consumers. For example the Procedure Cache which is divided into several different cache stores – e.g. one for ad-hoc sql plans, another for plans for stored procedures/functions/triggers and a separate for external stored procedures.
The Free List
SQL Server keeps a minimum number of free pages on the Free List to be able to serve incoming requests without any delay. The amount of free pages that SQL Server tries to keep in the Free list (“Min Free“ in DBCC MEMORYSTATUS output) is calculated based on the size of the Buffer pool and the amount of incoming requests (Page Life Expectancy, the expected lifetime of a page in cache, being one indicator).
Writing and Freeing buffer pages
SQL Server uses a Least Recently Used (LRU) algorithm for aging pages in the Buffer pool. Basically a counter is incremented every time a page is referenced and decremented every time the lazy writer process sweeps the page.
Any worker thread will at certain points (for example when an asynchronous IO is issued) check the memory status of the Buffer pool to make sure that a healthy number of free pages are available for new incoming requests. If the number of free buffers is not satisfactory, two things can happen:
If the upper limit of the Buffer pool has been reached (the limit being either “max server memory” or available memory in the operating system; both reflected in the SQL Server Memory Manager:Target Server Memory counter):
- the lazywriter process (although run by the worker thread) will sweep parts of the Buffer pool (it keeps track of where it left off the last sweep) and based on when the page was last used either leave it or – in case the page is dirty – flush it to disk and then put the page on the free list. A dirty page will not be flushed until the corresponding log records have been written to disk (for more information, see Write-ahead Logging in Books Online).
If the upper limit of the Buffer pool has not yet been reached:
- it will commit more reserved pages into the buffer pool rather than to move a page that has been used previously to the Free List.
This is how the Page Life Expectancy can stay very high on a server with little workload (or plenty of memory available) and also why Process:Private Bytes (sqlservr.exe) and SQL Server Memory Manager: Total Server Memory (more on these counters later) will keep growing even on a server with little activity.
The lion’s part of writing and freeing buffer pages is actually done by “normal” worker threads.
Is a system thread that flushes out batches of dirty aged buffers and puts the corresponding buffers on the Free List in the same manner as described above. The main purpose of the Lazywriter is to maintain the Free List.
Is another system process that wakes up every minute and checks each database to see whether it has exceeded the Recovery interval. Its purpose is to flush dirty pages in order to keep the number transactions that have to be rolled forward during a potential recovery to a minimum (or rather to ensure that the sum of the recovery cost value for all transactions does not exceed the number of minutes specified in Recovery Interval). However, the Checkpoint process does NOT put buffers back on the free list.
Is a special write mechanism used for non-logged IO operations such as BULK INSERT and SELECT INTO. Its purpose is to avoid so called Buffer pool trashing (which is what happens when large amounts of data that is unlikely to be re-used gets read into cache).
Now that we have some background – let’s look at the performance counters related to the buffer pool. You need to look at these counters altogether to get the complete picture. For instance, Page Life Expectancy is quite often all over the place - even dips to zero occasionally - even if there is no memory issue.
SQL Server Buffer Manager:
Free List stalls/sec
Page Life Expectancy
We already know that most writing and freeing buffers is made by individual worker threads. This means that on a very healthy system, Lazy writes/sec and Checkpoint pages/sec will be close to zero. However, if the worker threads cannot quite keep up because of the workload, you will see Lazy writes/sec climb up.
You might also see Free Pages jumping up and down occasionally and that in itself is not a problem. SQL server can usually recover quickly from a zero Free pages situation. However, if you see Free List Stalls, you can be pretty sure that your workload would benefit from more memory or from tuning to reduce the memory requirements (whether that is index tuning, query tuning or maybe even just an update of statistics).
To summarize, if Free Pages and Page Life Expectancy are intermittently dipping but you do not see any increase in Lazy writes/sec, the amount of memory available in the Buffer pool is probably enough for the workload (as measured by SQL Server Statistics: Batches/sec).
The Page Writes/sec and Page Reads/sec gives you information of how much SQL Server is contributing to hard page faults on the system. Because I typically look at to me unknown workloads, I find it hard to say anything about the numbers as they are. I usually look at it in comparison to some benchmark numbers (or in comparison to hard page faults overall on the server).
You might wonder why I have not included Buffer Manager: Buffer Cache hit ratio. The reason is simple; not all memory consumption will reflect in Buffer Cache Hit Ratio because not all, although most, memory is about data and index pages; you have things such as workspace memory or memory consumed by the optimizer. A high Buffer Cache Hit Ration doesn’t necessarily mean that all is well although a low cache hit ratio indicates issues with the Buffer pool (or rather the utilization of memory available in the buffer pool). Having said this, on a finely tuned OLTP system it works quite well as an indicator because you are typically not processing large amounts of data (that would require large workspace memory).
If you look in Performance Monitor, you will notice you have a Buffer Partition and Buffer Node. This is because some structures are per processor and per NUMA node respectively. (Aside; on NUMA you will have one Lazywriter per NUMA node).
There is another group of counters that concerns memory that requires special attention. On a finely tuned OLTP system, you normally don’t have issues with this but you should keep an eye on these counters as well when you are investigating the Buffer pool.
SQL Server Memory Manager:
Memory Grants Outstanding
Memory Grants Pending
Certain memory intensive queries require Memory Grants before executing and in order to balance the need for these queries with other user activity, SQL Server throttles the number of concurrent queries of this kind. Queries that require SORT or HASH operators will need a memory grant, or memory reservation, from the Memory Broker. The number of pending grants should be zero. If it’s not, it is a sign that some tuning or optimization should take place.
Lastly, a recap on the memory counters for SQL Server overall memory:
Process:Private Bytes:sqlservr.exe – all committed memory including the Buffer pool
Process:Working Set: sqlservr.exe - the part of committed memory (Private Bytes) which is mapped to physical RAM
SQL Server Memory Manager : Total Server Memory - the amount of committed memory used by the Buffer pool
SQL Server Memory Manager : Target Server Memory – the amount of memory reserved for the Bufferpool. This number is set at SQL Server startup but will be adjusted if the server gets into a low-memory situation.