THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Controlling Stored Procedure Caching with ... Dyanmic SQL?!?

Tell me if this situation sends a chill down your spine: You've written a stored procedure, tested it against a variety of inputs, and finally rolled it out in production. All is well... Or so you think. You start getting complaints from some users that it's taking forever to return. But other users are having no problem. What the..?

Veteran DBAs will know right away what's going on (even without reading the title of this post!) -- but for those of you who haven't had the pleasure of debugging these kinds of things, the answer is that cached execution plans are not always as wonderful for performance as we might like.

For any given query, there are numerous possible execution plans that the query optimizer can come up with. Some of them are optimal, some are less than optimal. But in the end, it's the job of the query optimizer to decide which one to use (hopefully, the optimal one). If a stored procedure is executed and its does not have a query plan in cache, whatever execution plan the optimizer decides to use will be cached for next time. This is usually a good thing -- it can be quite a bit of work for the optimizer to make that decision.

But in some cases, this is where the trouble begins. One of the main factors the optimizer uses is index statistics vs. what parameters are being used for the query. This can greatly affect what the 'correct' execution plan is -- the optimizer must decide such things as which index should be used, whether a seek or a scan should be performed, what types of joins are most efficient, etc. But as parameters change, so can the most appropriate choices.

To illustrate this better, some sample data will be useful. Break out your numbers table and run the following script, which will create a table with three columns, around 20 million rows, and a couple of indexes...

 

SELECT 
Number,
DATEADD(ss, Number, 0) AS TheDate
INTO DateTbl
FROM Numbers

DECLARE
@Num INT,
@Incr INT

SELECT
@Num = MAX(Number) + 1,
@Incr = MAX(Number) + 1
FROM DateTbl

WHILE @Num < 20000000
BEGIN
INSERT DateTbl (Number, TheDate)
SELECT Number + @Num, DATEADD(ss, Number + @Num, 0) AS TheDate
FROM Numbers

SET @Num = @Num + @Incr
END


CREATE UNIQUE CLUSTERED INDEX IX_Date ON DateTbl(TheDate)

CREATE UNIQUE NONCLUSTERED INDEX IX_Number ON DateTbl(Number)

ALTER TABLE DateTbl
ADD AnotherCol VARCHAR(40) NULL

UPDATE DateTbl
SET AnotherCol = CONVERT(VARCHAR, Number) + CONVERT(VARCHAR, TheDate)

Okay! Now that your hard drive's workout is done, let's take a look at what we have... DateTbl has three columns: A sequential number, a datetime column, and a character column. You should have one row for every second between January 1, 1900 and sometime around August 21, 1900, depending on how big your numbers table is. The date column and the number column are indexed (we'll be using those as predicates in the WHERE clause of the example queries), but the character column is not. That's on purpose, to force a bookmark lookup. What can I say -- this is a totally contrived example!

Put Query Analyzer into Show Execution Plan mode and check out the following:

 

SELECT 
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN '19000201 09:35:00' AND '19000201 09:36:00'
OR Number = 10

Before I proceed, I would just like to say that anyone who comments or e-mails me saying that this query can be re-written with a UNION to get consistently better execution plans will be slapped upside the head with a trout. YES, this is a bad query, but as I said, this is a very simple example. In real life, these situations are usually much more difficult to re-write. So if you don't like my example, go write your own article!

... Now that that's taken care of ...

The execution plan you should see will have a seek on each index. Makes sense -- we're looking at a very small chunk of data in each place. But what if we change the query to use a much larger date range?

 

SELECT 
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN '19000101' AND '19000201'
OR Number = 10

A seek on the date index no longer makes sense. A range scan is a better option. And why bother seeking on the Number column? The row with the number 10 is already found within the selected range. SQL Server agrees with me on this, and performs only a scan of the clustered date index.

But now let's see what happens when we throw this into a stored procedure. Create the following:

 

CREATE PROCEDURE GetStuff
@StartDate DateTime,
@EndDate DateTime,
@Number INT
AS
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN @StartDate AND @EndDate
OR Number = @Number

... And run the following in Query Analyzer with Show Execution Plan turned on ...

 

EXEC GetStuff '19000201 09:35:00', '19000201 09:36:00', 10

Same execution plan as before! That's great, right? Well...

 

EXEC GetStuff '19000101', '19000201', 10

... ... ...

This is taking a while ...

Enjoy the break? Good, now get back to work!

Check out the execution plan. I guess the cached one wasn't optimal for the second query. So how do we satisfy BOTH sets of arguments?

One way is to force the stored procedure to recompile each time:

 

ALTER PROCEDURE GetStuff
@StartDate DateTime,
@EndDate DateTime,
@Number INT
WITH RECOMPILE
AS
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN @StartDate AND @EndDate
OR Number = @Number

You'll notice that I added WITH RECOMPILE. And while that's probably not a big deal for this example stored procedure, it isn't a good idea for the types of really complex stored procedures where these problems crop up in the real world. Recompilation can be quite intensive, and I really don't want it happening every time an active stored procedure is called.

But you already knew that wasn't the solution, because in elementary school you were taught how to read, and the title of this article isn't "Controlling Stored Procedure Caching with ... WITH RECOMPILE".

No, instead the title is, "Controlling Stored Procedure Caching with ... Dyanmic SQL?!?"

Yes, dynamic SQL. If you don't know about dynamic SQL, go read this article right now and come back when you're finished.

You may have heard about a system stored procedure called sp_executesql. It lets you evaluate dynamic SQL, but it happens to also cache its execution plan. In addition, due to the fact that it accepts parameters, it makes SQL injection nearly impossible if correctly used. So it's good stuff. We could evaluate our test query using sp_executesql like this:

 

DECLARE @SQL NVARCHAR(300)

SET @SQL = '' +
'SELECT ' +
'Number, ' +
'TheDate, ' +
'AnotherCol ' +
'FROM DateTbl ' +
'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +
'OR Number = @Number'

EXEC sp_executesql
@SQL,
N'@StartDate DATETIME, @EndDate DATETIME, @Number INT',
@StartDate = '19000201 09:35:00',
@EndDate = '19000201 09:36:00',
@Number = 10

... And that's just wonderful, but it gives us absolutely nothing, because if you re-run it with the other parameters you'll find that you have the same problem as the stored procedure version:

 

DECLARE @SQL NVARCHAR(300)

SET @SQL = '' +
'SELECT ' +
'Number, ' +
'TheDate, ' +
'AnotherCol ' +
'FROM DateTbl ' +
'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +
'OR Number = @Number'

EXEC sp_executesql
@SQL,
N'@StartDate DATETIME, @EndDate DATETIME, @Number INT',
@StartDate = '19000101',
@EndDate = '19000201',
@Number = 10

Time for another coffee break...

But let us not lose hope yet, because we're still in the article that's talking about how to control caching and recompilation and you know that I wouldn't have written this article if I didn't know the answer.

So what's really being cached here? Let's take a look:

 

SELECT sql
FROM master..syscacheobjects
WHERE sql LIKE '%datetbl%'
AND cacheobjtype = 'Executable Plan'

-----------------------------------------------------

(@StartDate DATETIME, @EndDate DATETIME, @Number INT)
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN @StartDate
AND @EndDate OR Number = @Number

The cached plan is cached for not just the query, but also a parameter list -- and not just any parameter list, but the very parameter list that was passed in to sp_executesql. So how could we force SQL Server to cache a different plan for the same query?

... Change the parameter list!

The parameter list, of course, is correlated to the actual parameters passed in. But what you may not realize is that if you satisfy a parameter within the list, sp_executesql will not expect a correlated parameter to be passed in. For instance, the following is perfectly valid:

 

DECLARE @SQL NVARCHAR(300)

SET @SQL = '' +
'SELECT @TheParam AS TheParam'

EXEC sp_executesql
@SQL,
N'@TheParam VARCHAR(100) = ''This is the param'''

Not only that, but it's been cached:

 

SELECT sql
FROM master..syscacheobjects
WHERE sql LIKE '%param%'
AND cacheobjtype = 'Executable Plan'

-----------------------------------------------------

(@TheParam VARCHAR(100) = 'This is the param')
SELECT @TheParam AS TheParam

So what happens if we change our parameter's value?

 

DECLARE @SQL NVARCHAR(300)

SET @SQL = '' +
'SELECT @TheParam AS TheParam'

EXEC sp_executesql
@SQL,
N'@TheParam VARCHAR(100) = ''This is the other_param'''

Same query, but...

 

SELECT sql
FROM master..syscacheobjects
WHERE sql LIKE '%other_param%'
AND cacheobjtype = 'Executable Plan'

-----------------------------------------------------

(@TheParam VARCHAR(100) = 'This is the other_param')
SELECT @TheParam AS TheParam

Yes, a second cached execution plan! Exciting, isn't it? Kind of like winning the lottery, only even better, because you don't have to worry about how to spend all of that extra cash!

So how do we put this all together? A quick recap: We know that the query requires at least two execution plans; one for big date ranges, and one for smaller date ranges. There might be more, but we haven't tested that, so I'll leave it as an exercise for the reader. We also know that sp_executesql will cache a second, third, or Nth execution plan whenever the parameter list is changed. So all we need to do is change the parameter list depending on the inputs...

 

ALTER PROC GetStuff
@StartDate DATETIME,
@EndDate DATETIME,
@Number INT
AS
DECLARE @SQL NVARCHAR(300)
DECLARE @Params NVARCHAR(100)

SET @SQL = '' +
'SELECT ' +
'Number, ' +
'TheDate, ' +
'AnotherCol ' +
'FROM DateTbl ' +
'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +
'OR Number = @Number'

IF DATEDIFF(hh, @StartDate, @EndDate) <= 2
SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 1'
ELSE
SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 2'

EXEC sp_executesql
@SQL,
@Params,
@StartDate,
@EndDate,
@Number

Pretending that we've actually tested for the correct thresholds (which you should do if you use this technique), you'll notice that we're forcing a different execution plan if the time between start date and end date is less than or equal to two hours (that will be an index seek) or more than two hours (that will be an index scan).

Since forcing evaluation of a new execution plan in this case is simply a matter of changing the value of @dX1, you can add as many conditions as necessary to control which cached plan is used for any given set of arguments. Two hours is almost certainly not the best choice here, but really, does it matter?

So in conclusion, blah, blah, blah... No one reads this far, you stopped after you saw the final stored procedure, didn't you? Have a nice day, and enjoy your new, more dynamic stored procedures.

Published Wednesday, July 12, 2006 10:19 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Zach said:

"So in conclusion, blah, blah, blah... No one reads this far" Not true, but nice article anyway. I think you might be on the track of what I'm trying to figure out. Currently I'm comparing two stored procedures (old and new). They do the same thing but in different ways. The first time I run them new runs in about 1100 ms and old in about 1600 ms. However, on the second run and all runs after that new runs about 400 ms and old runs at about 100 ms. What I'm trying to do is find something I can do to prevent this optimization on the second, third, etc runs. I've tried running them with recompile but that doesn't change this behavior, it only seems to increase all my times in a static fashion (by about 200 ms). Ideally, I'd like all my runs to report like the first run does, but I can't seem to find a way to prevent caching, or to clear the already created cache. Any ideas? Note: this is for testing purposes not production purposes.

December 30, 2010 10:30 AM
 

Zach said:

With a little further googling it seems the answer to my problem is using sp_recompile.

December 30, 2010 10:39 AM
 

Adam Machanic said:

Zach: I don't use sp_recompile anymore, and this article is probably somewhat out-dated as well; SQL Server 2005's OPTION(RECOMPILE) enhancement allows us to be much more precise with the whole thing. Well, most of the time at least. There are still occasions where I need to play a game or two, but it's much less common than it used to be.

December 30, 2010 12:11 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement