THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

  • Geek City: How old are my statistics?

    This post is basically to answer a question asked in class this week: How can we get the last statistics update date for ALL user tables in a database?

    After working on the query for a while, I realized that the new metadata function I posted about here can give you that info easily:

    SELECT object_name(sp.object_id) as object_name,name as stats_name, sp.stats_id, 
        last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter
    FROM sys.stats AS s
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE sp.object_id > 100;

    But for those of you not yet running SQL Server 2008R2 Service Pack 2 or SQL Server 2012 Service Pack 1 (as long as it’s 2005 or later), I’ll show you the other query I had worked out:

    SELECT schema_name(schema_id) AS SchemaName,  object_name(o.object_id) AS ObjectName, AS IndexName, index_id, o.type,
        STATS_DATE(o.object_id, index_id) AS statistics_update_date
    FROM sys.indexes i join sys.objects o
           on i.object_id = o.object_id
    WHERE o.object_id > 100 AND index_id > 0
      AND is_ms_shipped = 0;

    There are a few slight differences in the output between the two queries, in addition to the fact that the query using sys.dm_db_stats_properties() adds the extra columns for number of rows sampled, the column modification counter, etc. The function does does  not return the schema_id or the type of object (e.g. is it a table or an indexed view). Both those pieces of information can be determined by joining with the sys.objects view, but I have left that as an exercise for the reader/student.


    I hope you find this useful!


  • Geek City: sp_cacheobjects for SQL Server 2012

    In a post about 4 1/2 years ago, I gave you my version of a replacement for the old pre-2005 pseudotable syscacheobjects. I called it sp_cacheobjects and created it as a view in the master database. With the sp_ prefix, the view can be accessed from any database.

    When testing this on SQL Server 2012, I noticed that I almost always got a lot more rows back than I was expecting. Even when I added a WHERE clause to limit the database to only the database I was working in, I STILL got way too many rows back. It turns out that in SQL Server 2012, SQL Server is frequently running background queries checking for both filetable and fulltext activites, and these background queries, that run in all databases, are cached like any other queries. To get the same kinds of results from this view that I got in SQL Server 2008, I needed to add a few extra filters, so I am including the modified version of my view here.

    -- Create a view to show most of the same information as SQL Server 2000's syscacheobjects
    -- This script has been updated for SQL Server 2012 to remove plans dealing with filetable and fulltext activities,
    -- and to also not return queries from system databases.

    -- by Kalen Delaney, 2012

    -- Feel free to remove those filters from the WHERE clause at the bottom
    USE master
    IF EXISTS (SELECT 1 FROM sys.views WHERE name = 'sp_cacheobjects')
        DROP VIEW sp_cacheobjects;
    CREATE VIEW sp_cacheobjects(bucketid, cacheobjtype, objtype, objid, dbid, dbidexec, uid, refcounts, 
                            usecounts, pagesused, setopts, langid, date_first, dateformat, status, lasttime, maxexectime, avgexectime, lastreads,
                            lastwrites, sqlbytes, sql)

                SELECT            pvt.bucketid, CONVERT(nvarchar(19), pvt.cacheobjtype) as cacheobjtype, pvt.objtype,
                                        CONVERT(int, pvt.objectid)as object_id, CONVERT(smallint, pvt.dbid) as dbid,
                                        CONVERT(smallint, pvt.dbid_execute) as execute_dbid, 
                                        CONVERT(smallint, pvt.user_id) as user_id,
                                        pvt.refcounts, pvt.usecounts, pvt.size_in_bytes / 8192 as size_in_bytes,
                                        CONVERT(int, pvt.set_options) as setopts, CONVERT(smallint, pvt.language_id) as langid,
                                        CONVERT(smallint, pvt.date_format) as date_format, CONVERT(int, pvt.status) as status,
                                        CONVERT(bigint, 0), CONVERT(bigint, 0), CONVERT(bigint, 0), 
                                        CONVERT(bigint, 0), CONVERT(bigint, 0),
                                        CONVERT(int, LEN(CONVERT(nvarchar(max), fgs.text)) * 2), CONVERT(nvarchar(3900), fgs.text)

                FROM (SELECT ecp.*, epa.attribute, epa.value
                            FROM sys.dm_exec_cached_plans ecp
                    OUTER APPLY sys.dm_exec_plan_attributes(ecp.plan_handle) epa) as ecpa
                       PIVOT (MAX(ecpa.value) for ecpa.attribute IN ([set_options],[objectid],[dbid],
                              [dbid_execute],[user_id],[language_id],[date_format],[status])) as pvt
                           OUTER APPLY sys.dm_exec_sql_text(pvt.plan_handle) fgs
             WHERE cacheobjtype like 'Compiled%'
             AND text NOT LIKE '%filetable%'
             AND text NOT LIKE '%fulltext%'
             AND pvt.dbid between 5 and 32766;


    I hope you find this useful!


  • Did You Know? What settings to always change


    A week ago, I taught my SQL Server 2012 Internals class to a great group of very interactive students. Even though a dozen of them were taking the class remotely, there were still lots of really great questions and and lots of discussion.

    One of the students asked if I could summarize all the settings that I recommended changing from the default, right out of the box. I said I’d try to put a list together by the end of the week, but I didn’t make it. So I said I would put it together and blog it.

    I think it sounded during the week like there were more changes than there really are. Going back through my notes, I only found three settings, all of them instance-wide configuration settings, that I recommend always changing. Of course, depending on your hardware, your workload and your data, you may make more changes. But the short list here contains the options that I always make sure are changed from the default on any system I am working on.  So here they are:

    1. Remote admin connections

    This option doesn’t apply to all remote administrative connections, despite the name, but only to connections made using the DAC (Dedicated Administrator Connection).  This option has a value of either 0 or 1, and 0 is the default. I recommend changing it to 1, which allow someone working at a machine other than the machine where your SQL Server is installed to make a DAC connection.  It might be too late to change it to 1 once you realize you need it! This value also needs to be set to 1 when you are connecting to a clustered SQL Server, which is always considered a remote connection.

    2.  Optimize for ad hoc workloads

    This also is a two-valued option, with a default of 0. I recommend changing it to 1.  There may be some very edge-case scenarios for leaving it at 0, but they’re pretty rare so you’re better off changing it to 1. In most cases, this can save you lots of memory, because single-use ad hoc query plans will now only use 300 bytes of memory instead of a minimum of 16K. Some plans use more, a lot more. I’ve seen SQL Server instances with tens of thousands of single-use ad hoc plans. I’ll let you do the math.

    3. Show advanced options

    By default, only about 16 of the configuration options are viewable and settable with sp_configure. If you want to see all your options, including ‘optimize for ad hoc workloads’, you need this option set to 1. The default is 0. 


    As a bonus, I’ll give you another list. These are options that usually should be left at the default value, so you should verify regularly that no one has changed them.  Two of them are instance-wide options, and two are database options.

    1. Auto create statistics and auto update statistics

    These are database options that have the default value of ON, and should be left that way. There may be cases you want to update statistics more often than auto statistics provides for, but that doesn’t mean you should turn the automatic updating off.  And there may be a few edge cases where you want to turn off the automatic update of statistics. Make sure you have a very good reason if you do so. But before you do, try turning on auto update statistics async to see if that helps whatever problems are leading you to consider turning automatic updates off. Also, take a look at the procedure sp_autostats. It allows to turn off automatic updates just for particular tables or indexes.  

    2.  Max degree of parallelism

    I’m not going to tell you what you should set this configuration option value to. There are a lot of recommendations out there, on other people’s blogs. But I’m just going to say make sure it is not set to 1 if you have more than one processor available to your SQL Server.  The default is 0, which means ALL SQL Server’s processors are available for parallel queries. In some (if not most?) cases, leaving it at the default is ok, but again, don’t set it to 1.  If you find particular queries do not perform well when run in parallel, you can use the MAXDOP hint for those queries. But don’t turn all parallelism off across the entire instance.

    3. C2 audit mode and common criteria compliance enabled

    Keep these configuration options set to 0 unless your business is mandated to have one of these options on. C2 audit mode has been deprecated but’s it still around for now. If you’re mandated to have these options enabled, you should know it. So otherwise keep these set to 0. You WILL notice performance degradation if you enable this.

    4. Autoshrink

    This is a database option that should never have been invented. Pretend it doesn’t exist, unless someone has set it to ON in one of your databases, then set it back to OFF. 


    So these are the options that I specifically call out in my class as having general best practice values. I talk about other options as well, but most of the others have an ‘it depends’ answer for what values you should use. And what ‘it depends’ on is what I spend a lot of my class talking about.

    Feel free to let me know if there are options on your list… that you always change, or always make sure are unchanged!




  • T-SQL Tuesday: What kind of Bookmark are you using?


    I’m glad there is no minimum length requirement for T-SQL Tuesday blog posts, because this one will be short. I was in the classroom for almost 11 hours today, and I need to be back tomorrow morning at 7:30.

    Way long ago, back in SQL 2000 (or was it earlier?) when a query indicated that SQL Server was going to use a nonclustered index to get row pointers, and then look up those rows in the underlying table, the plan just had a very linear look to it. The operator that indicated going from the nonclustered leaf to the data row was called a ‘Bookmark Lookup’, and it just looked a simple, single operator. Those of us that did troubleshooting of query plans knew that it could hide a multitude of sins, but to many people it looked very innocuous.

    Then in the next version, that simple, single Bookmark Lookup Operator was replaced by something that looked like a JOIN! When I first saw query plans showing a JOIN when doing a nonclustered index lookup, I was almost distraught, but it turns out that was only because it was new and different. The more I thought about it, the more I realized it was a Good Thing.

    Some people might get confused because they think of a JOIN as an operation that finds matches rows between two tables. But in fact, a JOIN can be used to find matches between any two sets of rows. And in the case of a nonclustered index lookup, SQL Server is finding rows in the leaf level of a nonclustered index (the ones that meet your filter condition(s) ) , and then is finding the matching rows in the underlying table. Internally, this is a JOIN operation.

    But when we look at the plans, there are two different operators used to show the actual lookup into the underlying table.

    There is a RID Lookup, used when the table is a heap. The nonclustered index contains Row ID, or RID, values (composed of a File ID, a Page ID and a Slot/Row number on the page). This RID is then used to ‘match’ with rows in the underlying table that have the same RID.

    There is also a KEY Lookup, used when the table has a clustered index.  The leaf level of a nonclustered index contains pointers that are the key values for the rows being pointed to. So to find the matching rows in the table, SQL Server takes the clustered key value from the nonclustered index, and then searches for that value in the clustered index, following the clustered index from the root down to its leaf.

    Below are the two icons used for these operations. If I were to give you a quiz, and ask which operator indicates we are finding a row in a table directly using a RID value, and which indicates we are finding a row in a table using a clustered index key, which would you say is which?  Take a moment to think about it.

    image       image

    I don’t know about you, but when I just try to figure out these two icons, I think the one on the left looks like it should be the KEY Lookup, and the one on the right should be the RID Lookup.

    But if you check the page in Books Online, or if you check your own query plans, you see that they are backwards!

    Here is a plan showing a RID Lookup:



    And here is a plan showing a KEY Lookup:


    Fortunately, the graphical query plan tells you what kind of operation it’s performing, so you can just ignore the picture and read the words. But what’s the point of a graphical plan in that case?

    So are they really backwards? Did someone just make a mistake and link in the wrong file when compiling the SQL Server graphical query plan code? Or am I misinterpreting this?

    I still think graphical query plans are one of the best thing ever added to product, so I’m really not complaining, but I’m just sayin….


    Have fun, and Happy T-SQL Tuesday!


  • Geek City: SQL Server 2014 In-Memory OLTP (“Hekaton”) Whitepaper

    Tech Ed isn’t even over yet, and my new Hekaton Whitepaper is already generating questions I can’t answer yet!

    Here are the two questions that were in my inbox this morning:

    1. The whitepaper says “There are other wait types, such as waiting for a log write to complete.”
    Does this mean that writing to the transaction log  will be the main bottleneck of the system?

    Hey folks, the product isn’t even publicly available. How can I, or anyone, know what your main bottleneck will be? You do know that there is ALWAYS a bottleneck, right? Something has to be the limiting factor? But I can’t even begin to make a guess yet as to what kind of limiting factors will be in Hekaton, now called SQL Server In-Memory OLTP. The paper is about the Internals, and that’s what I researched. I did not do any performance testing. But, as I did mention in the paper, log writing is much more efficient for Hekaton tables than for the disk-based tables. So even if it IS the limiting factor, it will not be as limiting as it could potentially be for non-Hekaton tables.


    2.  The whitepaper says “Note that none of the LOB data types are allowed.” 
    We have some 75 columns  which are still ntext, so  it seems we are dead in the water even if we convert to nvarchar(max)?

    As you read in the paper, rows are stored completely differently, and not on the disk-based 8K pages. So there is no mechanism for special pages for LOB or row-overflow data. Also, you’re aware that there is extra overhead for managing and accessing the LOB data, and to make In-Memory OLTP as efficient as possible, those extra-large rows needing extra work are avoided.  And like in my previous answer, without seeing your complete design (no, please don’t send it to me Smile ) there is no way I can tell just how ‘dead in the water’ you might be. I tend to feel there would be a way to redesign your tables, and separate the LOB types (which should be converted to something other than the deprecated ntext soon, anyway) into their own tables. As you read in the paper,  In-Memory OLTP tables can be joined with disk-based tables.

    Please feel free to send more questions, but I most likely will NOT be able to answer every one in a blog post. There will be a update to the paper, with more internals details, for CTP2, and some of the questions will be answered there. Finally, the ultimate plan is a book, like my SQL Server Concurrency book, all about Hekaton… oops, I mean In-Memory OLTP.  Hopefully, the book will contain best practices suggestions gathered from actual research with In-Memory OLTP implementations, plus a big Q&A section to answer questions that weren’t answered elsewhere in the book.



  • Geek City: Document more trace flags?


    It’s been over 5 years since I last blogged about trace flags, so it seems a post on the topic is in order.

    If a trace flag is undocumented, it means it is unsupported and not guaranteed. It can go away at any time, even from one service pack to the next. If a trace flag doesn’t work the way the person who told you about said it should work, there is no help for that. It is undocumented.

    Trace flags change SQL Server’s behavior. They are toggles, that can be turned on or off. When they are on, they force SQL Server to do things differently. Sometimes just a little bit differently, sometimes a LOT differently. Some are very harmless, such as 3604 which allows SQL Server to return output from certain undocumented DBCC commands to return output to the client. You need to use this flag if you’re going to use the DBCC PAGE command, which I talk about a lot in my writings and in my SQL Server Internals classes.

    However, there are other trace flags that change SQL Server behavior a LOT. The SQL Server developers at Microsoft create trace flags for lots of reasons, usually to force some sort of non-standard behavior. This might be so the developers can test the feature during development, or so that the support engineers can turn a feature off – either to confirm that the feature was causing a problem. These are usually not changes that you would want to make to your own SQL Server in a normal environment.

    But with so many undocumented trace flags, the question frequently arises “Why aren’t MORE trace flags documented?”

    Here are some of the reasons:

    -- Once a trace flag is official publicly documented, people will start to use it. And people will also start to misuse it. Microsoft needs to consider whether dealing with the misuse will be worse than the problems due to the lack of that particular trace flag.

    -- Before a trace flag is documented, it must be thoroughly tested. FULLY. Across all SKUs, all security environments, all performance conditions. And that is not easy. A trace flag added for one particular situation or one rare bug is usually not tested to the full extent that documented features are. So it will stay undocumented.

    -- Once a trace flag is added, it must be supported, into the future. And can’t be removed without officially deprecating it, at least one full version in advance. Some trace flags are known to have a limited potential usefulness, documenting them will force them to be maintained even when they’re no longer useful.

    -- Because most of the undocumented trace flags are created for a limited set of circumstances, they are not tested in conjunction with other trace flags. In order to be fully tested, each trace flag would have to be tested in combination with every other trace flag. And if a particular combination of trace flags was not pretty, which of the trace flags involved should be left undocumented? It would be really hard to tell, even after spending all the time on all the testing.

    The bar for Microsoft actually documenting a trace flag is therefore quite high. And sometimes if a feature introduced by trace flag is really needed, it will just become part of the product, or added using a different interface, such as a configuration option or database property.

    But Microsoft does document new trace flags. If you look at the history of Books Online to see the number of documented trace flags, you can see the increase:

    --  SQL Server 2005:               10 trace flags

    -- SQL Server 2008:                13 trace flags

    -- SQL Server 2008 R2:           15 trace flags

    -- SQL Server 2012:                19 trace flags


    So if you read about an undocumented trace flag, and you think it might be useful for you, be very careful when testing it. And if your careful testing shows it IS useful, let Microsoft know.

    But if your testing shows it isn’t useful, or is actually harmful, you don’t need to let Microsoft know. They probably already know. After all, it’s undocumented.



  • Geek City: More statistics info available!


    I just started using a new DMV (one that’s actually an ‘F’ not a ‘V’, as in Function) that gives us more info about distribution statistics. It returns info about the last statistics update date (which is also available with a function STATS_DATE()). It also provides the number of rows sampled when the statistics were last updated. This is available through DBCC SHOW_STATISTICS, and in a blog post a couple of months ago, I showed you how to get that output into a table for your own querying. One of the coolest things this new DMV shows is the row modification counter, which used to be available in sysindexes as a count of rows changed, but in SQL 2005 and later, SQL Server keeps track of changes to each column that has statistics on it. These values were not visible before, but now they are! This new object also reports any filter definition and includes rows for all statistics, whether index statistics or column statistics.

    As a table valued function, sys.dm_db_stats_properties can be used with the CROSS APPLY operator to give information for all statistics in a database, or you can filter to just return  user objects and not any system objects, as shown in the code here:

       sp.object_id, object_name(sp.object_id) as object_name,sp.stats_id, name as stats_name,
        filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter
    FROM sys.stats AS s
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE sp.object_id > 100;

    The new DMV was easy to miss because it wasn’t added until SQL Server 2008R2 Service Pack 2 and SQL Server 2012 Service Pack1. But since you should always be running with the latest service pack, you should have this great DMV if you’re any version after SQL Server 2008.

    Let me know if you find this DMV useful, and what you mainly use it for.

    Have fun!


  • Geek City: A Grammar Geek in the Cloud

    Those of you who know me well know that I am usually a stickler for spelling, grammar and proper word usage. I may have even lost a few friends because of what some people see as obsession. I am not infallible, and I do make typos, but I like to try to correct them if possible as soon as I discover them (or as soon as they are brought to my attention.)

    So now I will admit that I made a mistake in my usage of of the word ‘premise’, when talking about Cloud vs. non-Cloud databases. I was using the term on-premise as the opposite of ‘in the Cloud’, and my friend Cindy corrected me last week in no uncertain terms. So I went online and quickly found this post that proved she was right and I was wrong: 

    I did a bit more searching and found that I am not the only one making this mistake. A Research VP at Gartner actually makes the same mistake in a published post:

    I will admit that I find the word ‘on-premises’ awkward to say, so when I’m talking casually I might end up leaving off the last syllable.

    But anytime I’m writing, I promise to use premise vs. premises appropriately.

    And I expect the same from everyone else.



  • Geek City: Join With Me!

    I remember one of the most surprising changes in SQL Server 2000 2005 was how the graphical plans showed the use of a nonclustered index to seek, and the plan included something that looked like a JOIN to find rows in the base table.  Here’s an example. Although I used SQL Server 2008R2, the graphical plan will be pretty similar to what it looked like back in SQL Server 2000 2005. My code will make a copy of a table in the AdventureWorks2008 database, and then build an index on one of the columns.

    USE AdventureWorks2008;
    IF object_id('dbo.Sales') IS NOT NULL
        DROP TABLE dbo.Sales;
    SELECT * INTO dbo.Sales FROM Sales.SalesOrderHeader;
    CREATE INDEX Sales_SalesPersonID_index on dbo.Sales(SalesPersonID);

    Now look at the estimated graphical execution plan for this query, that searches for a particular value for the SalesPersonID column:

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 280;

    You should see something like this:


    It shows a JOIN even though there is only one table involved! What’s with that? It took me a few minutes to figure out what I was seeing, and I must admit that over the years since then I have come to appreciate this graphical representation. You’ll have a problem if you think of a JOIN as only finding matching rows between two tables, but can be solved if you realize that a JOIN can be used when finding matches between any two ROWSETS, i.e. any two sets of rows. There is a set of rows in the nonclustered index on SalesPersonID, and a set of rows in the table. The index seek finds all the index rows with a SalesPersonID value of 280, but those index rows only contain an index key value and a row pointer (RowID or RID). Because the query is requesting all the columns in the table, not just the SalesPersonID value,  SQL Server must find the rows in the dbo.Sales table that have a matching value for the RowID. Internally, it is doing a join, even though there is only one table. It is joining index rows with table rows.

    If the table had a clustered index, the plan would look almost exactly the same, with one difference. In SQL Server, if a table has a clustered index, nonclustered index rows do not contain RowIDs, they contain the clustered key value to be used a pointer to the row. SQL Server can take this clustered key from the nonclustered index row and look up in the clustered index to find the desired row. So I create a clustered index on SalesOrderNumber:

    CREATE UNIQUE CLUSTERED INDEX Sales_ident on dbo.Sales(SalesOrderNumber);

    Now when I run the same query as above, I get the following plan:


    The only difference in the two plans above is the icon for the lookup into the base table.  One is looking up into a heap using a RowID, which is an address containing the File ID, the page number and the slot or row number on the page. The other icon represents taking a key value from the nonclustered index row and looking it up by seeking through the clustered index. If you look again at these two icons, which one looks like a picture of directly addressing a row, and which looks like a picture of following another index into the table itself?


    The icon on the left is used for a RID lookup, and the icon on the right is used for a clustered index key lookup. I know that nobody asked me, but I think they got these backwards.

    But wait, there’s more…

    Prior to SQL Server 7, there was basically a rule of thumb with query plans that SQL Server would use one index (at most) per table per query. It could never (well, hardly ever) use two indexes on the same table in a single query. That changed in SQL Server 7 when a new algorithm called Index Intersection was introduced.

    I’m going to build another index on SalesOrderDate:

    CREATE INDEX Sales_OrderDate_index on dbo.Sales(OrderDate);

    This index will be used if I look for all orders place on July 1, 2002. The query plan for the following query is shown:

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01';


    We saw that the index on SalesPersonID is useful when looking for the value 280, and the index on SalesOrderDate is useful when looking for '2002-07-01'. But what if one query looks for rows that satisfy both conditions?

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01'
    AND SalesPersonID = 280

    Here’s the plan:


    We see two JOINs ! First, SQL Server finds the sets of index rows meeting each of the WHERE filter conditions and those two index row sets are joined together, using the clustered key value as the join column. Then only after SQL Server finds rows that meets both conditions, it takes the common clustered index key and uses it to find the row in the underlying table.

    But wait, there’s more!

    Both of these queries use a table scan, because the filter conditions are not selective enough. There are way more rows with a SalesPersonID value of 289 than there were for a value of 280, and there are more rows with an OrderDate of March 1, 2004 than there are with an OrderDate of July 1, 2002.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01' ;

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 289;

    Both of the above queries show the same plan. Because the table now has a clustered index, the plan shows a clustered index scan, but it is exactly the same work as a table scan:


    However, if you look at the plan for a query that uses both filter conditions, we don’t get a clustered index scan.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01'
    AND SalesPersonID = 289;


    Even though each filter alone is not very restrictive, they can both be used for index intersection. The reason SQL Server usually doesn’t choose index seeks on non-restrictive filters is because the cost of then accessing the base table for all the qualifying rows can be very high. But even though each individual index returns a lot of rows, the optimizer’s estimate of the number of rows that meet BOTH conditions is very small, and therefore the number of lookups into the base table is quite small and affordable.

    So, what can you learn? Just because you think an column might not be selective enough to build a nonclustered index on, don’t discount it.  If that column will frequently be used in queries with multiple conditions, it just might turn out to be more useful than you thought!

    So make sure you test many possible choices of indexes and combinations of indexes before you know what the best choices are.

    Have fun!


  • Geek City: Accessing Distribution Statistics

    Distribution statistics are one of the most important sources of information that the Query Optimizer uses to determine a good query plan. In this post, I’m not going to tell you everything about distribution statistics. I’m just going to show you a few tricks for getting access to the statistics.

    If you want a deeper understanding of what the statistics keep track of, and you don’t have any of my SQL Server Internals books handy, check out this whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 

    Microsoft does provide us a tool called DBCC SHOW_STATISTICS for examining the distribution statistics.

    Microsoft has gradually been making more of the more of the old DBCC commands available as DMVs, even some undocumented ones. For example, one of my favorites, DBCC IND, has now been replaced in SQL Server 2012  by sys.dm_db_database_page_allocations.

    I have been wishing for several versions that Microsoft would make the DBCC SHOW_STATISTICS information available as a DMV. But it hasn’t happened yet, and I’m tired of waiting, so I decided to do something about it.

    My solution is not quite as easy to use as a DMV might be, but it allows you to get the information that DBCC SHOW_STATISTICS provides into a set of three tables that can then be saved into a more permanent location of your choice, and/or queried as desired.

    DBCC SHOW_STATISTICS returns three sets of information, with different columns in the output, so three different tables are needed. The DBCC SHOW_STATISTICS command can be called with an argument that specifies that you just want one of the three sets returned. The options are

    WITH STAT_HEADER – returns basic info such as last update date, and number of rows in the table/index. Also reports number of steps returned for HISTOGRAM section.

    WITH DENSITY_VECTOR – returns density info for each left-based subset of columns in the index. For example, an index on (lastname, firstname, city) would have a density value for (lastname), for (lastname, firstname), and for (lastname, firstname, city). Each density value is a single number representing the average number of occurrences and depends on the number of distinct values. For example, if there are only 2 possible values in the column, the density would be 0.5. Multiplying density by the number of rows in the STAT_HEADER section would give the average expected rowcount if a query was executed looking for an equality on the specified column(s).

    WITH HISTOGRAM – returns a set of ordered values from the first column of the index, creating a histogram. This histogram provides the optimizer with selectivity information for specific values or ranges of values in the first column of the index.

    To collect this info, I will use one of my favorite tricks, which is to create a table in the master database with a name starting with sp_. (I’ve written about this trick several times, including in this earlier blog post.) Once I have the table(s) created, I can access them from any database. So here are the three tables:

    USE Master;

    IF  (SELECT object_id('sp_stat_header')) IS NOT NULL
      DROP TABLE sp_statsheader;
    CREATE TABLE sp_stat_header
    (   Name sysname,
        Updated datetime,
        Rows bigint,
        Rows_sampled bigint,
        Steps smallint,
        Density numeric (10,9),
        Average_key_length smallint,
        String_index char(3),
        Filter_expression nvarchar(1000),
        Unfiltered_rows bigint);

    IF  (SELECT object_id('sp_density_vector')) IS NOT NULL
      DROP TABLE sp_density_vector;
    CREATE TABLE sp_density_vector
    (  all_density numeric(10,8),
       average_length smallint,
       columns nvarchar(2126) );

    IF  (SELECT object_id('sp_histogram')) IS NOT NULL
      DROP TABLE sp_histogram;
    CREATE TABLE sp_histogram
    (   RANGE_HI_KEY sql_variant,
        RANGE_ROWS bigint,
        EQ_ROWS bigint,
        DISTINCT_RANGE_ROWS bigint,
        AVG_RANGE_ROWS bigint);

    The second trick is to use INSERT … EXEC to execute a DBCC statement and populate the tables. I will build the DBCC  command dynamically, after capturing the schema, table and index names in variables. You of course could take this code and turn it into a stored procedure, for which the schema, table and index names are passed as parameters. I’ll use as an example a table in the AdventureWorks2008 sample database, just so you can try running the code, and I can verify that it actually works!

    I will use the table Sales.SalesOrderDetail and the index IX_SalesOrderDetail_ProductID. So the object name (@oname) is SalesOrderDetail, the schema name (@sname) is Sales, and the index name (@iname) is IX_SalesOrderDetail_ProductID.

    USE AdventureWorks2008;
    DECLARE @oname sysname,  @iname sysname, @sname sysname

    SELECT @oname = 'SalesOrderDetail',  @sname = 'Sales', @iname = 'IX_SalesOrderDetail_ProductID';
    -- Update the object name to include the schema name, because that is the format the DBCC command expects
    SELECT @oname = @sname +'.' + @oname;

    TRUNCATE TABLE sp_stat_header;
    INSERT INTO sp_stat_header
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH STAT_HEADER');

    TRUNCATE TABLE sp_density_vector;
    INSERT INTO sp_density_vector
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH DENSITY_VECTOR');

    TRUNCATE TABLE sp_histogram;
    INSERT INTO sp_histogram
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH HISTOGRAM');

    So now you can look at the values collected and filter or query in any way, or use SELECT INTO to save them into another table, so the sp_ tables can be used the next time you want to capture distribution statistics information.

    SELECT * FROM sp_stat_header;

    SELECT * FROM sp_density_vector;

    SELECT * FROM sp_histogram;


    Let me know if you find this useful, and especially if you embellish it to create a procedure or an automated process of your own!



  • Geek City: What Triggered This Post?

    I’d really like to get another post up onto my much neglected blog before the end of 2012. This will also start one of my New Year’s resolutions, which is to write at least one blog post a month. I’m going to tell you about a change in SQL Server that wasn’t announced in any “What’s New” list that I ever saw, perhaps because it was just a chance in internal behavior, and nothing that required any change in user applications.

    Do you retest what you know is true for every new version? When I update my books, I do test all the scripts, but if there isn’t a script, I don’t retest every ‘fact’ that I have known for years is true. And sometimes, things change. And sometimes my reviewers notice those unreported changes, and sometimes they don’t.

    You might be aware of the fact that SQL Server can perform UPDATE operations in two different ways. The UPDATE can be performed as a two-step process: delete the old row and then insert a whole new row, or, the UPDATE can be performed (much more efficiently) as an update-in-place.  When the two-step UPDATE is performed, it is a LOT more work. Not only does SQL Server have to log the entire old row and the entire new row, but each nonclustered index is also modified twice, and each of those index changes also has to be logged. So it’s nice when an update-in-place is done, because only the bytes changed are logged, and only indexes on the updated columns are affected.

    Prior to SQL Server 7, there were actually four different ways that UPDATE could be done. The two-step UPDATE had some variations that could make it even slower in some cases! But that was a long time ago, so I’m not going to go into the details now. But I will say that back then, in order to get an update-in-place to occur, there was a big long list of prerequisites that had to be met and if you missed just one, you’d get one of the slower UPDATE operations.

    As of SQL Server 7, update-in-place became the default. The only time it doesn’t happen is when the row can’t stay in the same location (such as when you update a clustered index key column) or when SQL Server really needs the old and new versions of the row.

    In SQL 7, one of the places that SQL needed the old and new version of the updates rows was when processing triggers. Triggers need the transaction log to get the contents for the DELETED and INSERTED pseudo-tables. And because triggers needed the entire old and new versions of the updated rows, the UPDATE was performed as a two-step operation. DELETE the old row, log the entire old row, and the INSERT the new row with the new values, and log the entire new row.

    But as of 2005, we now have the version store, primarily used for SNAPSHOT isolation, but available for other uses as well. In SNAPSHOT isolation, the version stores stores ‘old versions’ of rows that have been updated or deleted.  I knew that the version store was also used for triggers, but it only occurred to me just recently that maybe, because the old and new versions of the row were not needed from the log, perhaps UPDATEs did not always need to be performed internally as a two-step UPDATE.

    So I decided to test it out.

    -- DEMO: If there is an UPDATE trigger, are updates logged as DELETE + INSERT?
    -- First build a new database.

    USE master;
    IF (SELECT db_id('TestTrigger')) IS NOT NULL
        DROP DATABASE TestTrigger;
    CREATE DATABASE TestTrigger;
    SELECT db_id('TestTrigger');

    USE TestTrigger;

    -- Just for a warmup, look at the function fn_dblog, which works in the current database

    SELECT * FROM fn_dblog(null, null);

    -- Create a new table to work with
    IF (SELECT object_id('objects')) IS NOT NULL
        DROP TABLE objects;
    SELECT TOP 100 * INTO objects FROM sys.objects;

    -- Create a clustered index on the table
    CREATE CLUSTERED INDEX objects_clustered on objects(name);

    -- First examine an update we know is NOT done in place,
    -- i.e. updating a clustered key value

    UPDATE objects SET name = 'newrowsets' WHERE name = 'sysrowsets';

    -- Look at last 10 rows; notice a LOP_DELETE_ROWS and LOP_INSERT_ROWS
    -- The AllocUniteName column shows the object affected is the clustered index on dbo.objects
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);

    -- Now examine an update we know is  done in place,
    -- i.e. updating an unindexed column on a table with no triggers
    UPDATE objects SET parent_object_id = 1 WHERE name = 'sysfiles1';

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW on the dbo.objects allocation unit
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);

    -- Create an update trigger
    -- Will the update be done with the siple LOP_MODIFY_ROW or with the LOP_DELETE_ROWS and LOP_INSERT_ROWS
    CREATE TRIGGER trg_update_objects ON objects FOR UPDATE

    -- Now perform update again
    UPDATE objects SET parent_object_id = 10 WHERE name = 'sysfiles1';

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW
    SELECT * FROM fn_dblog(null, null);

    Since the database is in SIMPLE recovery model, you can issue a CHECKPOINT before each UPDATE if you want to reduce the number of rows in the log to make it easier to examine.

    So it seems that I need to update my course and some of my writings. There might also be special cases that still require that an two-step UPDATE be performed in the presence of triggers, but it seems like a two-step UPDATE is not ALWAYS required anymore. That is very good news!

    I hope you all have a wonder-filled and joyous New Year!


  • Did You Know? I’m delivering my first SQL Server 2012 class this week!

    This  is actually just a very short post to get my name back up in the list on the right side of SQLBlog, because Adam’s configuration removes people who haven’t posted in 3 months. I’ve been extremely busy trying to get my  new book finished, and then get my course updated for its first delivery this week. I’ll be teaching it again in Norway at the end of the month. 

    It’s a great class this week, and a couple of questions have inspired longer posts, which hopefully I be able to get to next week.

    Hopefully I’ll also have an update on the timeline for the new book by next week.

    My schedule is always available if you are interested in learning SQL Server 2012 Internals with me!



  • Did You Know? Turning Off Locking

    Back when the lock manager was rewritten for SQL Server 7, and row level locking was added to the product, the procedure sp_indexoption gave us the ability to turn OFF either row or page locks or both, for one particular index, or for a table and all the indexes on it. I must admit, I rarely found this option to be the solution to any sticky problems, although while doing some quick research prior to writing this post, I did find that Microsoft recommended turning off locking options on a couple of tables  when working with SAP:

    In my experience, it is MUCH more common to have people turn off row or page locks when they shouldn’t, and by inspecting the sys.indexes catalog view, you can tell if that has happened.  (Prior to SQL Server 2005,  you could use the INDEXPROPERTY function.) 

    I recently got email from a reader of my books who pointed out to me an error I had made in my SQL Server 2005 book, where I had claimed that sp_indexoption could not be used to disallow row or page locks on heaps. If you check the BOL, it clearly says that you can supply either an index name or a table name to sp_indexoption, so that was just an oversight in my writing. 

    But then along comes ALTER INDEX, and the BOL now says you should avoid using sp_indexoption, as it will be removed in a future release, and you should use ALTER INDEX instead. So the reader sent me a follow-up question, asking how can we disallow row or page locking on a heap using ALTER INDEX. 

    It turns out that we can specify the ALL option instead of an index name when using ALTER INDEX, and this has the same effect as using sp_indexoption on a heap.  You can see it in this short example.

    First create a heap with a nonclustered index:

    use tempdb;
    IF object_id('testlocks') IS NOT NULL
       DROP TABLE testlocks;

    CREATE TABLE testlocks
    ( a int);
    CREATE INDEX testlocks_index ON testlocks(a);

    Now look at the lock properties:

    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');


    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- ---------------
    1               1                HEAP
    1               1                NONCLUSTERED

    As shown, the default is that both row and page locks are allowed.

    Now use the old sp_indexoption to disallow row locks and then check the lock properties again:

    EXEC sp_indexoption testlocks, disallowrowlocks, 1;
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');


    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    0               1                HEAP
    0               1                NONCLUSTERED

    Notice that sp_indexoption turned off row locking for both the heap and the nonclustered index.

    Now use ALTER INDEX to allow row locks and inspect again:

    ALTER INDEX ALL ON testlocks
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');


    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    1               1                HEAP
    1               1                NONCLUSTERED

    Keep in mind that I’m not recommending that you turn off either row or page locking. And if you decide to change the types of locks allowed, make sure you test your application thoroughly under your maximum expected load to make sure you haven’t made things worse. And remember that you can’t turn off TABLE locking, only row and page locking.

    Have fun!


  • Geek City: Build a Big Table with a Columnstore Index

    I was looking all over to find a big table to use for my columnstore examples in my SQL Server 2012 book, and the only one I found was from someone at Microsoft and was not publicly available. When I demonstrate code in my writings, I like to have the code available for anyone to use, so that was no use.

    Finally I realized I was just going to have to do it myself. I actually based the script on some work by the awesome Benjamin Nevarez (blog| twitter), but I needed to make a couple of changes to the table structure, and then the loop for adding the data took some tweeking to make sure that uniqueness was possible, in case you need to test out how the metadata changes when a columnstore index is built on a table with a unique clustered index compared to a nonunique clustered index compared to a heap.

    I have just finished the chapter on indexes, and decided to make this script available. The initial table data is based on the data in Microsoft AdventureWorksDW2012 sample database that you can download here. (Note that the DW version is the first one under “Other Available Downloads”’; it’s not the one under “Recommended Download”. )

    Here is the section of the script that populates most of the FactInternetSalesBIG table. (The original 60K rows were just copied from the original FactInternetSales table. Note that all the statements have to be run in a single batch because the local variable’s scope is the batch, and because the GO 9 applies to the single batch that precedes it. Of course, you can change the number of iterations to end up with a different size table. Mine ends up at just over 30 million rows. In addition to modifying the SalesOrderNumber value on each iteration, I also changed the value of the Revision column to indicate which pass through the insert loop was being executed.

    -- Copy the new big table into itself 9 times
    DECLARE @RevisionNumber nchar(2);
    SELECT @RevisionNumber = RevisionNumber + 1 FROM RevisionNumberValue;
    SELECT @RevisionNumber as RevisionNumber;
    INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)
    SELECT ProductKey
    ,SalesOrderNumber + @RevisionNumber
    FROM dbo.FactInternetSalesBig;
    UPDATE RevisionNumberValue SET RevisionNumber = RevisionNumber + 1;
    GO 9

    Make sure you have enough log space. My log grew from 1GB to just over 8GB when running the script to build the 30 million row table and columnstore index, with no clustered index.

    The downloadable script has commented options to build either a clustered or nonclustered index before you build the columnstore index. I suggest building the clustered index before the columnstore index, because building the clustered index will rebuild any existing nonclustered indexes, including my columnstore index. Building the columnstore index can take a while, on my system it was about 10 minutes. (Building the original 30+ million row table took even longer.)

    After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!

    Have fun!


  • Follow-up Answers for my Australia Classes

    I was out of the country for the last two weeks of March, delivering classes in Brisbane and Sydney, which were organized by WardyIT.  It was a great visit and there were 24 terrific students!

    As is sometimes (perhaps often?)  the case, there were questions posed that I couldn’t answer during class, so here are a couple of follow-up answers.

    1. I brought up the fact that SQLS 2012 generates a warning message when there are ‘too many’ Virtual Log Files (VLFs) in a database. (It turns out the message will be generated when a database has more than 10,000 VLFs but the error says that the database has more than 1000 VLFs! Which is of course, technically true, but the message is a little misleading.)  A student reported hearing that the Best Practices Analyzer also had a warning about the number of VLFs, but didn’t know what that limit was, and asked if I knew. I didn’t then, but I do  now. A message will be generated by BPA when there are 1000 or more VLFs. The message is actually focused on the performance problems that can arise when there are two many VLFs. The boot message is more concerned with the impact on database restore, rollback, backup and recovery of too many VLFs.

    2. When discussing locks, I briefly mention the BU lock that is acquired by request (with a hint) during bulk load operations, for the purpose of allowing multiple processes to load into the same table concurrently. Someone asked whether any SSIS components used the BU lock, and I thought it was probably true, but I since I don’t work with SSIS, I couldn’t comment any deeper. I was able to discover, thanks to SQL Server MVPs Paul White and Arthur Zubarev that it definitely is possible for both SQL Destinations and OLE DB Destinations. The destination table must be a heap, and you can specify TABLOCK as an option for the FastLoadOptions property, as shown:


    3. I very briefly mention Extended Events on the very last day, and mention that in SQL Server 2008 one of the drawbacks is that there is no GUI option to set up an XEvents session, but that has changed in SQL Server 2012. I was asked if the 2012 Management Studio could connect to a SQL Server 2008 instance and set up an XEvents session on that instance. My friend and fellow MVP Ben Miller was able to verify that when connecting from the SQL 2012 GUI to a 2008 instance, the option for creating an XEvents session does not even show up.  You could try scripting a session created for a 2012 instance and running it on a 2008 instance, but there are no guarantees.

    So now I’ll get back to working on my new book …


This Blog


Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement