THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Stats on Stats

I have been working on new features for the Rules-Driven Maintenance Solution, including per-index maintenance preferences and more selective processing for statistics. SQL Server stats is a topic I knew about at a high level, but lately I have had to take a deeper dive into the DMVs and system views and wrap my head around things like sys.dm_db_stats_properties() and sysindexes.rowmodctr.

Many people have published great stuff on this, and I am indebted to Erin Stellato before and after she joined SQL Skills, also Kendra Little and Grant Fritchey, Ted Krueger, Microsoft, and others. So this post is no earth-shaking new discovery. These are just some notes from my studies.

First, if you want to move beyond SQL Server’s auto update statistics and a nightly run of sp_updatestats, it’s important to understand where things stand with the statistics objects in the database. I wanted statistics metadata queries that:

  1. Show the age of the statistics values
  2. Show how heavily the underlying table or index has been modified since a statistics update
  3. Show the size of the indexes
  4. Allow me to create a formula relating modifications to size
  5. Work with table partitioning
  6. Work with multiple schemas

Most of these are straightforward except for the “how heavily has the table been modified” bit. As these other authors point out, there are two places to locate that information: in older versions, the best thing is the rowmodctr column in the legacy sysindexes view, and in new versions the management function sys.dm_db_stats_properties(). The latter is actually only in service packs, post RTM, for SQL Server 2008 R2 and 2012.

So, there are really two queries – no way around it:

Update 21 Sept 2013: I found some improvements for these queries and updated them in place.

/* Legacy sysindexes example */

SELECT  statsSummary.schemaName,
        statsSummary.tableName,
        statsSummary.statsName,
        statsSummary.object_id,
        statsSummary.stats_id,
        statsSummary.statsUpdate,
        statsSummary.statsType,
        statsSummary.indexName,
        statsSummary.index_id,
        statsSummary.modCounter,
        rowcounts.row_count 
FROM (
        SELECT  sc.name schemaName,
                o.name tableName,
                s.name statsName,
                s.object_id,
                s.stats_id,
                STATS_DATE(o.object_id, s.stats_id) statsUpdate,
                CASE WHEN i.name IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statsType,
                ISNULL( i.name, ui.name ) AS indexName,
                ISNULL( i.index_id, ui.index_id ) AS index_id, 
                si.rowmodctr AS modCounter
        FROM sys.stats s
            INNER JOIN sys.objects o ON s.object_id = o.object_id
            INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
            LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id

            -- Getting rowmodctr from sysindexes for legacy versions
            LEFT JOIN sys.sysindexes si ON s.object_id = si.id AND s.stats_id = si.indid

            -- If the statistics object is not on an index, get the underlying table
            LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
    ) AS statsSummary
    INNER JOIN (
        SELECT object_id, index_id, SUM(row_count) row_count
        FROM sys.dm_db_partition_stats
        GROUP BY object_id, index_id
        HAVING SUM( row_count ) > 0
    ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
        AND statsSummary.index_id = rowCounts.index_id

/* 2008 R2 SP2 and higher example */

SELECT  statsSummary.schemaName,
        statsSummary.tableName,
        statsSummary.statsName,
        statsSummary.object_id,
        statsSummary.stats_id,
        statsSummary.statsUpdate,
        statsSummary.statsType,
        statsSummary.indexName,
        statsSummary.index_id,
        statsSummary.modCounter,
        rowCounts.row_count 
FROM (
        SELECT  sc.name schemaName,
                o.name tableName,
                s.name statsName,
                s.object_id,
                s.stats_id,
                STATS_DATE(o.object_id, s.stats_id) statsUpdate,
                CASE WHEN i.name IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statsType,
                ISNULL( i.name, ui.name ) AS indexName,
                ISNULL( i.index_id, ui.index_id ) AS index_id, 
                sp.modification_counter AS modCounter
        FROM sys.stats s
            INNER JOIN sys.objects o ON s.object_id = o.object_id
            INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
            LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id

            -- If the statistics object is not on an index, get the underlying table
            LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )

            CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    ) AS statsSummary
    INNER JOIN (
        SELECT object_id, index_id, SUM(row_count) row_count
        FROM sys.dm_db_partition_stats
        GROUP BY object_id, index_id
        HAVING SUM( row_count ) > 0
    ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
        AND statsSummary.index_id = rowCounts.index_id

Using one or the other of these, it’s possible to make a list of statistics objects with a comparison of modifications to the underlying table, row count and so on, and then build update statements to act accordingly. This can be useful for large tables where reaching the auto update threshold would take too long, or to prevent updating statistics on tables where there have been only a few modifications but sp_updatestats would update the statistics anyway, possibly with a lower sample value than one would like. The classic example here is to rebuild and index, where you get stats with perfect sampling, and then replace those perfect stats with worse ones as a result of updating statistics with default sampling.

The Problem with Rowmodctr

Ted Krueger has a really nice demo showing some cases when auto updates will happen here. Ted does a great job explaining the 500-plus-20-percent rule that SQL Server seems to use for auto update (note, though, the xevents code in the post is only for SQL Server 2012 or later). But there is one gotcha with the way the server counts modifications to a table.

Ted doesn’t do this, but in other contexts, it’s easy to throw around the 500-plus-20-percent “of the table is modified” rule without strictly observing what that means. For example, is that rule the server ensuring that 20 percent of the different rows in the table - that is, a number of rows that comprise 20 percent of the total number of rows in the table – get modified? Probably not. How do repeated updates to the same rows count? What about data in the table/index that isn’t a key, but just an included column?

Secondly, it’s important to note for auto-update that the actual action of updating the statistics is not triggered by the changes to the data. I used to think that it was triggered by a select statement against the table where the statistics are invalidated by modification, but that, as it turns out, is only partly true. Really the statistics are only updated when the optimizer needs to use them to make a new query plan. There are cases where a query is already cached, and the threshold for auto update is exceeded, but the statistics are not updated because the optimizer has not requested an update in order to optimize a new query.

The developers for SQL Server obviously have more access to internal data and counters than we do. Reading between the lines on this type of statistics update vs. table modification DMV issue, I get the sense they are a little bit circumspect about making all this data fully exposed and documented. So, we have what we have, which is STATS_DATE(), sys.dm_db_stats_properties().modification_counter and sys.sysindexes.rowmodctr.

Here’s a demo showing some of the nuances of this, and how the rowmodctr or modification_counter act in perhaps counterintuitive ways.

First, set up a test database with views that contain slightly tweaked versions of the stats metadata queries, so we can call them easily, and a test table and indexes:

CREATE DATABASE StatsDemo;
GO

USE StatsDemo
GO

CREATE VIEW dbo.stats_metadata AS

SELECT  statsSummary.schemaName,
        statsSummary.tableName,
        statsSummary.statsName,
        statsSummary.object_id,
        statsSummary.stats_id,
        statsSummary.statsUpdate,
        statsSummary.statsType,
        statsSummary.indexName,
        statsSummary.index_id,
        statsSummary.modCounter,
        rowCounts.row_count 
FROM (
        SELECT  sc.name schemaName,
                o.name tableName,
                s.name statsName,
                s.object_id,
                s.stats_id,
                STATS_DATE(o.object_id, s.stats_id) statsupdate,
                CASE WHEN i.index_id IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statstype,
                ISNULL( i.name, ui.name ) AS indexName,
                ISNULL( i.index_id, ui.index_id ) AS index_id,
                sp.modification_counter AS modCounter
        FROM sys.stats s
            INNER JOIN sys.objects o ON s.object_id = o.object_id
            INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id

            -- If a statistics object is on an index, get that index:
            LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id

            -- If the statistics objects is not on an index, get the underlying table:
            LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
            OUTER APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    ) AS statsSummary
    INNER JOIN (
        SELECT object_id, index_id, SUM(row_count) row_count
        FROM sys.dm_db_partition_stats
        GROUP BY object_id, index_id
        HAVING SUM( row_count ) > 0
    ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
        AND statsSummary.index_id = rowCounts.index_id
where schemaName = 'dbo'

GO

CREATE VIEW dbo.stats_metadata_legacy AS

SELECT  statsSummary.schemaName,
        statsSummary.tableName,
        statsSummary.statsName,
        statsSummary.object_id,
        statsSummary.stats_id,
        statsSummary.statsUpdate,
        statsSummary.statsType,
        statsSummary.indexName,
        statsSummary.index_id,
        statsSummary.modCounter,
        rowcounts.row_count 
FROM (
        SELECT  sc.name schemaName,
                o.name tableName,
                s.name statsName,
                s.object_id,
                s.stats_id,
                STATS_DATE(o.object_id, s.stats_id) statsUpdate,
                CASE WHEN i.index_id IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statsType,
                ISNULL( i.name, ui.name ) AS indexName,
                ISNULL( i.index_id, ui.index_id ) AS index_id,
                ISNULL( si.rowmodctr, usi.rowmodctr ) AS modCounter
        FROM sys.stats s
            INNER JOIN sys.objects o ON s.object_id = o.object_id
            INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id

            -- If a statistics object is on an index, get that index:
            LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id
            LEFT JOIN sys.sysindexes si ON i.object_id = si.id AND i.index_id = si.indid

            -- If the statistics objects is not on an index, get the underlying table:
            LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
            LEFT JOIN sys.sysindexes usi ON ui.object_id = usi.id AND ui.index_id = usi.indid
    ) AS statsSummary
    INNER JOIN (
        SELECT object_id, index_id, SUM(row_count) row_count
        FROM sys.dm_db_partition_stats
        GROUP BY object_id, index_id
        HAVING SUM( row_count ) > 0
    ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
        AND statsSummary.index_id = rowCounts.index_id
WHERE schemaName = 'dbo'
GO

CREATE TABLE dbo.SomeData(
    id int IDENTITY(1,1) NOT NULL,
    val1 nvarchar(50) NULL,
    val2 nvarchar(50) NULL,
    val3 nvarchar(50) NULL,
 CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED  (
        id ASC
    )
);

CREATE NONCLUSTERED INDEX IX_Val1 ON dbo.SomeData (
    val1 ASC
);
GO

Next, check the statistics views. They should be empty, because our test table starts with no rows, and the those views filter out zero-row tables:

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy

Now we will put a row in the test table. This will invalidate the statistics, and then when the optimizer needs to optimize a query that uses the key columns in the indexes – not when the row is inserted – the stats will auto-update. Look closely at the results at each stage, and the timestamps on the stats update column. Also note that the placement of GO is important, because we need the text of the select statements/batches to be identical later in the demo:

-- From zero to more than zero rows

INSERT INTO dbo.SomeData (
    val1,
    val2,
    val3
) VALUES (
    'A', 
    'B', 
    'C' 
)
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

I get this, which is expected:

StatsUpdateResults

The mod counter and row_count columns make sense, and the statsUpdate dates support the idea that a stat for an index is updated when the optimizer needs to use it for a query and the data in the table has been changed.

Now things get less intuitive, though.

-- From 1 to 20 rows

INSERT INTO dbo.SomeData (
    val1,
    val2,
    val3
) VALUES (
    'A', 
    'B', 
    'C' 
)
GO 19

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

So far, so good. 20 rows does not trigger an update. At this point the mod counters show ‘19’ changes, which makes sense.

Now, what happens if we do a lot of updates, but just to a single row:

UPDATE dbo.SomeData SET val1 = 'Z' WHERE id = 1
GO 1000

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO
StatsUpdateResults2

OK, now wait a second. We updated the table 1000 times (more than 500 + 20 percent). We updated a column that is in both the nonclustered index and also in the clustered index. Why is the mod counter 19 for the clustered index and 1019 for the non? And how is the mod counter so much higher than the actual number of rows in the table? And why no statistics update? The date stamp did not change. Lots of questions.

Consider: the updated column is a key in the nonclustered index but just a plain old column in the clustered index. So it appears, and this makes sense, that modifications are tracked to the keys related to statistics but not other columns. So the mod counters update when we change a key value of a specific index, and just for that index.

Next, consider whether the select query is being optimized at this point, or is in cache from before. Hmm. More on this later.

Lastly, I want to point out that the mod counter has been bumped 1000 times, where we only executed an update to one row, and didn’t change the actual value for 999 of those updates.

So: modification counter is not sensitive to which rows in a table are changed. It’s not “20 percent of the rows” in that rule of thumb. What it really seems to mean is each time an update/insert/delete is performed, the number of rows affected by each operation is just incremented. It doesn’t mean they were different rows or even different values. This makes sense, because you would have quite a lot of overhead to actually track how many distinct rows are changed with before and after values. I believe what happens is a bit like, for every query “get the result of @@rowcount where the query changed a key column in the index, and add that to the modification counter.”

If you need more proof, here’s a query that does change a value on each iteration, but for the same row:

UPDATE dbo.SomeData 
SET val1 = CASE val1 WHEN 'A' THEN 'Z' ELSE 'A' END 
WHERE id = 1
GO 1000

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO
Still no stats update. Same pattern in the stats metadata.

How about, then, changing more of the rows?

UPDATE dbo.SomeData 
SET val1 = CASE val1 WHEN 'A' THEN 'Z' ELSE 'A' END 
--where id = 1
GO 1000

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

Now this is really odd. My statistics metadata shows no update to the stats and 22,019 modifications to the 20 row nonclustered index, but still 19 changes to the clustered index. Huh?

When you think about it, though, it does make sense. The mod counter is just incremented with the number of affected rows for keys in the index. If a key is not affected, then the counter doesn’t increment. Therefore the statistics for the clustered index are still valid – no keys have changed. There have been no changes to those keys since we populated the table, even though we have changed other data. Statistics about non-key columns are not relevant, so changes to them are not relevant.

Now let’s bulk up the table, past those milestones of 500 rows and so on:

-- From 20 to 499 rows

INSERT INTO dbo.SomeData (
    val1,
    val2,
    val3
) VALUES (
    'A', 
    'B', 
    'C' 
)
GO 479


SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

-- From 499 to 500 rows

INSERT INTO dbo.SomeData (
    val1,
    val2,
    val3
) VALUES (
    'A', 
    'B', 
    'C' 
)
GO 1


SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

-- From 500 to 5000 rows

INSERT INTO dbo.SomeData (
    val1,
    val2,
    val3
) VALUES (
    'A', 
    'B', 
    'C' 
)
GO 4500

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

StatsUpdateResults3

Would it surprise you to note that there still has been no update to statistics? Or that the mod counter now shows 26,999 changes to this 5,000 row table?

The answer? Well, I took a look in the plan cache:

SELECT UseCounts, Cacheobjtype, Objtype, text, query_plan
FROM sys.dm_exec_cached_plans 
    CROSS APPLY sys.dm_exec_sql_text(plan_handle)
    CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE text like '%SELECT%FROM%SomeData%'

In my run of this demo, I have 9 (!) versions of the two test select statements above that are in cache, some are adhoc, some are auto parameterized. But they are optimized already.

If I do this (only in test, please!):

DBCC FREEPROCCACHE

And then I do this:

SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
GO
SELECT val1 FROM dbo.SomeData WHERE id = 1
GO

SELECT * FROM dbo.stats_metadata
SELECT * FROM dbo.stats_metadata_legacy
GO

I do finally get a statistics update, and the mod counters reset to 0:

StatsUpdateResults4

(This does not mean you should free the proc cache on your server, this is just a demo)

So, what are the takeaways here?

  1. Rowmodctr or modification_counter does not track the “percentage of the table that has been modified,” a common misconception. I think they really track the “number of times a DML operation affected key values in the index.”
  2. The counters know nothing about whether the modifications were to one row repeatedly or to many rows. They are not perfect – and they can’t tell you, I don’t think, “how much” of the table is changed.
  3. Auto update stats is triggered by query optimization – that is, the optimizer needs to take a “new” query, where the statistics used for optimization have been invalidated, and it will request new statistics, assuming auto update stats is on, which are collected just before this new query is optimized. On a real, running system this probably happens very quickly, but as this demo shows, some cases exist where a plan might be out in cache and no optimization happens to make the stats update.

This was very educational for me, and is shaping how I plan to updated the rules-based maintenance from “any row changed” for stats to something, hopefully, more accurate. The mod counter values are so coarse, though, that that can’t, I’m finding, be an exact science.

Published Wednesday, September 18, 2013 4:19 PM by merrillaldrich

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Scott Duncan said:

I think "Now this is really odd. My statistics metadata shows no update to the stats and 22,019 modifications to the 20 row nonclustered index, but still 19 changes to the nonclustered index. Huh?" should probably be "Now this is really odd. My statistics metadata shows no update to the stats and 22,019 modifications to the 20 row nonclustered index, but still 19 changes to the CLUSTERED index. Huh?"

Presumably the same findings apply to columns that have statistics but are not part of an index. I.e. Rowmodctr or modification_counter tracks DML operations on the column but the statistics do not get updated unless a query is executed against it that requires a new query plan to be generated.

September 18, 2013 7:36 PM
 

merrillaldrich said:

Yes, that was a typo - good eye, and thank you. I think your assumption on non-index/column stats is correct, but I was planning to prove that out next.

September 18, 2013 8:20 PM
 

Merrill Aldrich said:

Happy Fall! It’s a beautiful October here in Minneapolis / Saint Paul. In preparation for my home town

October 9, 2013 7:19 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement