A few days ago Aaron posted yet another fantastic entry in his Bad Habits series, this one discussing mishandling of date ranges in queries. This is a topic near and dear to me, having had to clean up a lot of poorly thought out code in the past few years. Aaron's post includes many examples, all of which boil down to the proper way to do the job, a query like the following (which I've stolenborrowed from the post):
SELECT
COUNT(*)
FROM dbo.SomeLogTable
WHERE
DateColumn >= '20091011'
AND DateColumn < '20091012';
This query asks a simple question: How many rows do we have for October 11, 2009? And assuming the presence of an index on DateColumn, the work will be done much more efficiently than the most common anti-pattern I see, converting both the input date and the date column to CHAR(8) using CONVERT format 112 (YYYYMMDD) and comparing the strings. Doing that will cause a table scan, which is definitely a bad habit worth kicking.
But what I want to show in this post is that contrary to what you might understand from working with previous versions of SQL Server, in SQL Server 2008 conversion is not always a bad thing. In conjunction with the addition of the DATE type, the query optimizer received a minor upgrade. Converting both sides of the predicate to DATE will cause the optimizer to do exactly what you probably wanted to begin with: Return all of the events that occurred today (or on whatever input date you specified), by seeking into the index rather than scanning the table.
To see this optimization, start with the following test data:
CREATE TABLE #dates
(
the_date DATETIME NOT NULL PRIMARY KEY
);
GO
WITH
numbers AS
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number > 0
)
INSERT #dates
SELECT DATEADD(hh, -number, GETDATE())
FROM numbers
UNION ALL
SELECT DATEADD(hh, number, GETDATE())
FROM numbers;
GO
... And now ask a question. What happened today? First we'll ask the wrong way:
SET SHOWPLAN_TEXT ON;
GO
SELECT
COUNT(*)
FROM #dates
WHERE
CONVERT(CHAR(8), the_date, 112) = CONVERT(CHAR(8), GETDATE(), 112);
GO
SET SHOWPLAN_TEXT OFF;
GO
---
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(DEFINE:([Expr1006]=Count(*)))
|--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#dates]), WHERE:(CONVERT(char(8),[tempdb].[dbo].[#dates].[the_date],112)=CONVERT(char(8),getdate(),112)))
Oops! An index scan. That won't do. But if you like writing code like this, all is not lost--you just need to do a different conversion:
SET SHOWPLAN_TEXT ON;
GO
SELECT
COUNT(*)
FROM #dates
WHERE
CONVERT(DATE, the_date) = CONVERT(DATE, GETDATE());
GO
SET SHOWPLAN_TEXT OFF;
GO
---
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1009],0)))
|--Stream Aggregate(DEFINE:([Expr1009]=Count(*)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1007], [Expr1008], [Expr1006]))
|--Compute Scalar(DEFINE:(([Expr1007],[Expr1008],[Expr1006])=GetRangeThroughConvert(CONVERT(date,getdate(),0),CONVERT(date,getdate(),0),(62))))
| |--Constant Scan
|--Clustered Index Seek(OBJECT:([tempdb].[dbo].[#dates]), SEEK:([tempdb].[dbo].[#dates].[the_date] > [Expr1007] AND [tempdb].[dbo].[#dates].[the_date] < [Expr1008]), WHERE:(CONVERT(date,[tempdb].[dbo].[#dates].[the_date],0)=CONVERT(date,getdate(),0)) ORDERED FORWARD)
This plan is quite a bit more complex, but the important things to note are:
- An index seek is used, rather than an index scan, meaning that this query will return the results in a much more efficient manner
- The new query is logically equivalent to the first query
Need more than one day? Use either IN or BETWEEN, both of which produce query plans similar to the above.
SELECT
COUNT(*)
FROM #dates
WHERE
CONVERT(DATE, the_date) IN (CONVERT(DATE, GETDATE()-1), CONVERT(DATE, GETDATE()));
GO
SELECT
COUNT(*)
FROM #dates
WHERE
CONVERT(DATE, the_date) BETWEEN CONVERT(DATE, GETDATE()-1) AND CONVERT(DATE, GETDATE());
GO
Please note that I'm not suggesting that this methodology is any better or worse than what Aaron suggested in his post. I simply want you to know your options and--more importantly--understand that the knee-jerk "we must never use functions in the WHERE clause" approach is not always advantageous. The query optimizer has come a long way in recent versions of SQL Server and I suspect we'll see a lot more in these areas in the coming releases. It will be interesting to see how many of today's truths cease to be an issue as the optimizer becomes smarter and more adept at fixing user error.
Enjoy!