Tell me if this situation sends a chill down your spine: You've written
a stored procedure, tested it against a variety of inputs, and finally
rolled it out in production. All is well... Or so you think. You start
getting complaints from some users that it's taking forever to return.
But other users are having no problem. What the..?
Veteran DBAs will know right away what's going on (even without
reading the title of this post!) -- but for those of you who haven't
had the pleasure
of debugging these kinds of things, the answer is that cached execution
plans are not always as wonderful for performance as we might like.
For any given query, there are numerous possible execution
plans that the query optimizer can come up with. Some of them are
optimal, some are less than optimal. But in the end, it's the job of
the query optimizer to decide which one to use (hopefully, the optimal
one). If a stored procedure is executed and its does not have a query
plan in cache, whatever execution plan the optimizer decides to use
will be cached for next time. This is usually a good thing -- it can be
quite a bit of work for the optimizer to make that decision.
But in some cases, this is where the trouble begins. One of the
main factors the optimizer uses is index statistics vs. what parameters
are being used for the query. This can greatly affect what the
'correct' execution plan is -- the optimizer must decide such things as
which index should be used, whether a seek or a scan should be
performed, what types of joins are most efficient, etc. But as
parameters change, so can the most appropriate choices.
To illustrate this better, some sample data will be useful. Break out your numbers table and run the following script, which will create a table with three columns, around 20 million rows, and a couple of indexes...
SELECT
Number,
DATEADD(ss, Number, 0) AS TheDate
INTO DateTbl
FROM Numbers
DECLARE
@Num INT,
@Incr INT
SELECT
@Num = MAX(Number) + 1,
@Incr = MAX(Number) + 1
FROM DateTbl
WHILE @Num < 20000000
BEGIN
INSERT DateTbl (Number, TheDate)
SELECT Number + @Num, DATEADD(ss, Number + @Num, 0) AS TheDate
FROM Numbers
SET @Num = @Num + @Incr
END
CREATE UNIQUE CLUSTERED INDEX IX_Date ON DateTbl(TheDate)
CREATE UNIQUE NONCLUSTERED INDEX IX_Number ON DateTbl(Number)
ALTER TABLE DateTbl
ADD AnotherCol VARCHAR(40) NULL
UPDATE DateTbl
SET AnotherCol = CONVERT(VARCHAR, Number) + CONVERT(VARCHAR, TheDate)
Okay! Now that your hard drive's workout is done, let's take a look
at what we have... DateTbl has three columns: A sequential number, a
datetime column, and a character column. You should have one row for
every second between January 1, 1900 and sometime around August 21,
1900, depending on how big your numbers table is. The date column and
the number column are indexed (we'll be using those as predicates in
the WHERE clause of the example queries), but the character column is
not. That's on purpose, to force a bookmark lookup. What can I say --
this is a totally contrived example!
Put Query Analyzer into Show Execution Plan mode and check out the following:
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN '19000201 09:35:00' AND '19000201 09:36:00'
OR Number = 10
Before I proceed, I would just like to say that anyone who comments
or e-mails me saying that this query can be re-written with a UNION to
get consistently better execution plans will be slapped upside the head
with a trout. YES, this is a bad query, but as I said, this is a very
simple example. In real life, these situations are usually much more
difficult to re-write. So if you don't like my example, go write your
own article!
... Now that that's taken care of ...
The execution plan you should see will have a seek on each
index. Makes sense -- we're looking at a very small chunk of data in
each place. But what if we change the query to use a much larger date
range?
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN '19000101' AND '19000201'
OR Number = 10
A seek on the date index no longer makes sense. A range scan is a
better option. And why bother seeking on the Number column? The row
with the number 10 is already found within the selected range. SQL
Server agrees with me on this, and performs only a scan of the
clustered date index.
But now let's see what happens when we throw this into a stored procedure. Create the following:
CREATE PROCEDURE GetStuff
@StartDate DateTime,
@EndDate DateTime,
@Number INT
AS
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN @StartDate AND @EndDate
OR Number = @Number
... And run the following in Query Analyzer with Show Execution Plan turned on ...
EXEC GetStuff '19000201 09:35:00', '19000201 09:36:00', 10
Same execution plan as before! That's great, right? Well...
EXEC GetStuff '19000101', '19000201', 10
... ... ...
This is taking a while ...
Enjoy the break? Good, now get back to work!
Check out the execution plan. I guess the cached one wasn't
optimal for the second query. So how do we satisfy BOTH sets of
arguments?
One way is to force the stored procedure to recompile each time:
ALTER PROCEDURE GetStuff
@StartDate DateTime,
@EndDate DateTime,
@Number INT
WITH RECOMPILE
AS
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN @StartDate AND @EndDate
OR Number = @Number
You'll notice that I added WITH RECOMPILE. And while that's
probably not a big deal for this example stored procedure, it isn't a
good idea for the types of really complex stored procedures where these
problems crop up in the real world. Recompilation can be quite
intensive, and I really don't want it happening every time an active
stored procedure is called.
But you already knew that wasn't the solution, because in
elementary school you were taught how to read, and the title of this
article isn't "Controlling Stored Procedure Caching with ... WITH
RECOMPILE".
No, instead the title is, "Controlling Stored Procedure Caching with ... Dyanmic SQL?!?"
Yes, dynamic SQL. If you don't know about dynamic SQL, go read this article right now and come back when you're finished.
You may have heard about a system stored procedure called sp_executesql.
It lets you evaluate dynamic SQL, but it happens to also cache its
execution plan. In addition, due to the fact that it accepts
parameters, it makes SQL injection nearly impossible if correctly used.
So it's good stuff. We could evaluate our test query using
sp_executesql like this:
DECLARE @SQL NVARCHAR(300)
SET @SQL = '' +
'SELECT ' +
'Number, ' +
'TheDate, ' +
'AnotherCol ' +
'FROM DateTbl ' +
'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +
'OR Number = @Number'
EXEC sp_executesql
@SQL,
N'@StartDate DATETIME, @EndDate DATETIME, @Number INT',
@StartDate = '19000201 09:35:00',
@EndDate = '19000201 09:36:00',
@Number = 10
... And that's just wonderful, but it gives us absolutely nothing, because if you re-run it with the other parameters you'll find that you have the same problem as the stored procedure version:
DECLARE @SQL NVARCHAR(300)
SET @SQL = '' +
'SELECT ' +
'Number, ' +
'TheDate, ' +
'AnotherCol ' +
'FROM DateTbl ' +
'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +
'OR Number = @Number'
EXEC sp_executesql
@SQL,
N'@StartDate DATETIME, @EndDate DATETIME, @Number INT',
@StartDate = '19000101',
@EndDate = '19000201',
@Number = 10
Time for another coffee break...
But let us not lose hope yet, because we're still in the article
that's talking about how to control caching and recompilation and you
know that I wouldn't have written this article if I didn't know the
answer.
So what's really being cached here? Let's take a look:
SELECT sql
FROM master..syscacheobjects
WHERE sql LIKE '%datetbl%'
AND cacheobjtype = 'Executable Plan'
-----------------------------------------------------
(@StartDate DATETIME, @EndDate DATETIME, @Number INT)
SELECT
Number,
TheDate,
AnotherCol
FROM DateTbl
WHERE TheDate BETWEEN @StartDate
AND @EndDate OR Number = @Number
The cached plan is cached for not just the query, but also a
parameter list -- and not just any parameter list, but the very
parameter list that was passed in to sp_executesql. So how could we
force SQL Server to cache a different plan for the same query?
... Change the parameter list!
The parameter list, of course, is correlated to the actual
parameters passed in. But what you may not realize is that if you
satisfy a parameter within the list, sp_executesql will not expect a
correlated parameter to be passed in. For instance, the following is
perfectly valid:
DECLARE @SQL NVARCHAR(300)
SET @SQL = '' +
'SELECT @TheParam AS TheParam'
EXEC sp_executesql
@SQL,
N'@TheParam VARCHAR(100) = ''This is the param'''
Not only that, but it's been cached:
SELECT sql
FROM master..syscacheobjects
WHERE sql LIKE '%param%'
AND cacheobjtype = 'Executable Plan'
-----------------------------------------------------
(@TheParam VARCHAR(100) = 'This is the param')
SELECT @TheParam AS TheParam
So what happens if we change our parameter's value?
DECLARE @SQL NVARCHAR(300)
SET @SQL = '' +
'SELECT @TheParam AS TheParam'
EXEC sp_executesql
@SQL,
N'@TheParam VARCHAR(100) = ''This is the other_param'''
Same query, but...
SELECT sql
FROM master..syscacheobjects
WHERE sql LIKE '%other_param%'
AND cacheobjtype = 'Executable Plan'
-----------------------------------------------------
(@TheParam VARCHAR(100) = 'This is the other_param')
SELECT @TheParam AS TheParam
Yes, a second cached execution plan! Exciting, isn't it? Kind of
like winning the lottery, only even better, because you don't have to
worry about how to spend all of that extra cash!
So how do we put this all together? A quick recap: We know that
the query requires at least two execution plans; one for big date
ranges, and one for smaller date ranges. There might be more, but we
haven't tested that, so I'll leave it as an exercise for the reader. We
also know that sp_executesql will cache a second, third, or Nth
execution plan whenever the parameter list is changed. So all we need
to do is change the parameter list depending on the inputs...
ALTER PROC GetStuff
@StartDate DATETIME,
@EndDate DATETIME,
@Number INT
AS
DECLARE @SQL NVARCHAR(300)
DECLARE @Params NVARCHAR(100)
SET @SQL = '' +
'SELECT ' +
'Number, ' +
'TheDate, ' +
'AnotherCol ' +
'FROM DateTbl ' +
'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +
'OR Number = @Number'
IF DATEDIFF(hh, @StartDate, @EndDate) <= 2
SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 1'
ELSE
SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 2'
EXEC sp_executesql
@SQL,
@Params,
@StartDate,
@EndDate,
@Number
Pretending that we've actually tested for the correct thresholds
(which you should do if you use this technique), you'll notice that
we're forcing a different execution plan if the time between start date
and end date is less than or equal to two hours (that will be an index
seek) or more than two hours (that will be an index scan).
Since forcing evaluation of a new execution plan in this case
is simply a matter of changing the value of @dX1, you can add as many
conditions as necessary to control which cached plan is used for any
given set of arguments. Two hours is almost certainly not the best
choice here, but really, does it matter?
So in conclusion, blah, blah, blah... No one reads this far,
you stopped after you saw the final stored procedure, didn't you? Have
a nice day, and enjoy your new, more dynamic stored procedures.