<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Adam Machanic : T-SQL, Performance</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/Performance/default.aspx</link><description>Tags: T-SQL, Performance</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>What Happened Today? DATE and Date Ranges Over DATETIME</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2009/10/20/what-happened-today-date-and-date-ranges-over-datetime.aspx</link><pubDate>Tue, 20 Oct 2009 18:21:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:18023</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>14</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/18023.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=18023</wfw:commentRss><description>A few days ago Aaron posted yet another fantastic entry in his Bad Habits series, this one discussing mishandling of date ranges in queries . This is a topic near and dear to me, having had to clean up a lot of poorly thought out code in the past few...(&lt;a href="http://www2.sqlblog.com/blogs/adam_machanic/archive/2009/10/20/what-happened-today-date-and-date-ranges-over-datetime.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=18023" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/date/default.aspx">date</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/datetime/default.aspx">datetime</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Optimization/default.aspx">Optimization</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Who's On First? Solving the Top per Group Problem (Part 1: Technique)</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2008/02/08/who-s-on-first-solving-the-top-per-group-problem-part-1-technique.aspx</link><pubDate>Fri, 08 Feb 2008 23:09:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:4992</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>18</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/4992.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=4992</wfw:commentRss><description>&lt;p&gt;Relative comparison is a simple matter of human nature. From early childhood we compare and contrast what we see in the world around us, building a means by which to rate what we experience. And as it turns out, this desire to discover top and bottom, rightmost and leftmost, or best and worst happens to extend quite naturally into business scenarios. Which product is the top seller? How about the one that's simply not moving off the shelves? Which of our customers has placed the most expensive order? What are the most recent orders placed at each of our outlets?&lt;/p&gt;&lt;p&gt;In the world of common business questions, the edge cases are generally of most interest. What's in the middle is unimportant; it's often too difficult for the mind to compare and comprehend when there are hundreds, thousands, or even millions of items, transactions, or facts that are all within a similar range. Instead, we focus on those that stick out in some extraordinary way.&lt;/p&gt;&lt;p&gt;Those of us who work with SQL products on a regular basis are faced with solving this same problem time and again as we work through various business requirements. Over time, I have noticed four basic query patterns that can be used to solve the problem; each are logically equivalent (within certain restrictions -- more on that later), but can have surprisingly different performance characteristics depending on the data being queried. In this first post, I will outline the available patterns/methods. In the following posts, I will show the results of testing each pattern against a variety of scenarios in an attempt to discover where and when each should be used.&lt;/p&gt;&lt;p&gt;The four basic patterns are outlined below. Each of the methods is illustrated using a query to show all customers' names, plus their most recent order date, and the amount of that order. I've included notes that indicate where logic differences can arise among the various methods.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 1: Join to full group and use correlated subquery with a MIN/MAX aggregate to filter&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method we use an inner join to get all required columns, then filter the resultant set using a correlated subquery in the WHERE clause. &lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;JOIN Orders o ON o.CustomerId = c.CustomerId&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate&amp;nbsp; =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT MAX(o1.OrderDate)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o1&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.CustomerId = o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: With this method ties are automatically included in the output, unless a tiebreaker is specified (which can be tricky given that you only have one column to work with). This method does not allow you to pull back an arbitrary number of rows, such as top 10 per customer; you are limited to the edge and any ties that might exist. &lt;b&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 1a: Join to full group and use correlated subquery with TOP(n) and ORDER BY to filter&lt;/b&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;This method is almost identical to Method 1 (which is why it is classified here as 1a), but the TOP and ORDER BY allow for a bit more flexibility than the aggregates.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;JOIN Orders o ON o.CustomerId = c.CustomerId&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate&amp;nbsp; =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP(1)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.OrderDate&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o1&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.CustomerId = o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.OrderDate DESC&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;/blockquote&gt;&lt;p&gt;Logic notes: With this method you can more easily integrate a tiebreaker than with Method 1; the comparison column can be anything, including a primary key, and you can still order on whatever column makes most sense. In addition, you can take more rows than with Method 1 by using IN instead of = in the WHERE clause, and increasing the argument value to TOP.&lt;br&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 2: CROSS APPLY to ordered TOP(n)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method, SQL Server 2005's CROSS APPLY operator is used. This operator allows us to essentially create a table-valued correlated subquery -- something that impossible in previous versions of SQL Server. By using TOP in conjunction with ORDER BY we can get as many rows per group as needed.&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;CROSS APPLY&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP(1)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId = c.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate DESC&lt;br&gt;) x&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: This method is almost identical, from a logic point of view, with Method 1a modified to use IN on a primary key column. With both methods WITH TIES can be added to the TOP in order to get ties.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 3: Join to derived table that uses a partitioned, ordered windowing function, and filter in the outer query based on the row number&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method a derived table or CTE is used, in conjunction with a windowing function partitioned based on the required grain of the final query. So for the "most recent order per customer" query, the row number is partitioned based on the customer. This gives us a count starting at 1 for each customer, which can be filtered in the outer query.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;INNER JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY o.OrderDate DESC&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS r&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;) x ON&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.CustomerId = c.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; AND x.r = 1&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: If ties are important, use DENSE_RANK instead of ROW_NUMBER. ROW_NUMBER is good for arbitrary TOP(n), similar to Method 2. Unlike the previously described methods, in conjunction with DENSE_RANK this method can return an arbitrary TOP(n) rows, all of which can include ties. So if you would like to see the three most recent order dates and each happens to have multiple orders, this method will be able to return them all by simply filtering on x.r = 3. This would not be directly possible with any of the other methods described here.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 4: "Carry-along sort"&lt;/b&gt;&lt;/p&gt;&lt;p&gt;This is the only "tricky" method, and not one that I recommend using, except as a last resort. I'm including it here only for completeness and comparison because it happens to be a very high performance method in some cases. This method involves converting each of the required inner columns into a string, concatenating them, then applying an aggregate to the string as a whole. By putting the "sort" column first, the other data is "carried along" -- thus the name for the method. The concatenated data is then "unpacked" in the outer query. This can be surprisingly efficient from an I/O standpoint, but the resultant code is a maintenance nightmare and it is quite easy to introduce errors. In addition, this method can only return the top 1 per group -- no ties or multiple return items are supported.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(DATETIME, SUBSTRING(x.OrderInfo, 1, 8)) AS OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(MONEY, SUBSTRING(x.OrderInfo, 9, 15)) AS OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;INNER JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAX&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(CHAR(8), OrderDate, 112) +&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(CHAR(15), SubTotal)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) OrderInfo&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; GROUP BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId&lt;br&gt;) x ON&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.CustomerId = c.CustomerId&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;This post is just the beginning; watch this space in the coming days for a series of performance tests and analysis of these methods, and some results that I personally found to be quite surprising.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=4992" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/logic/default.aspx">logic</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Medians, ROW_NUMBERs, and performance</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/12/18/medians-row-numbers-and-performance.aspx</link><pubDate>Mon, 18 Dec 2006 19:52:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:437</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>6</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/437.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=437</wfw:commentRss><description>A couple of days ago, Aaron Bertrand posted about &lt;a href="http://sqlblog.com/blogs/aaron_bertrand/archive/2006/12/15/428.aspx"&gt;a method for calculating medians in SQL Server 2005&lt;/a&gt; using the ROW_NUMBER function in conjunction with the COUNT aggregate. This method (credited to Itzik Ben-Gan) is interesting, but I discovered an even better way to attack the problem in &lt;a href="http://www.amazon.com/Celkos-Analytics-Kaufmann-Management-Systems/dp/0123695120/sr=8-1/qid=1166482464/ref=sr_1_1/105-6595410-7450029?ie=UTF8&amp;amp;s=books"&gt;Joe Celko's Analytics and OLAP in SQL&lt;/a&gt;.&lt;br&gt;&lt;br&gt;Rather than using a COUNT aggregate in conjunction with the ROW_NUMBER function, Celko's method uses ROW_NUMBER twice: Once with an ascending sort, and again with a descending sort. The output rows can then be matched based on the ascending row number being within +/- 1 of the descending row number.&amp;nbsp; This becomes clearer with a couple of small examples:&lt;br&gt;&lt;br&gt;

&lt;table class="MsoTableGrid" style="border:medium none;border-collapse:collapse;" cellpadding="0" cellspacing="0"&gt;
 &lt;tr&gt;
  &lt;td style="border:1pt solid windowtext;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;A&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;B&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;C&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;D&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
&lt;/table&gt;

&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;

&lt;table class="MsoTableGrid" style="border:medium none;border-collapse:collapse;" cellpadding="0" cellspacing="0"&gt;
 &lt;tr&gt;
  &lt;td style="border:1pt solid windowtext;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;A&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;5&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;B&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;C&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;D&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr style="height:14.35pt;"&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;E&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;5&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
&lt;/table&gt;

&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;br&gt;&lt;/o:p&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;o:p&gt;In the first table (even number of rows), the median rows are B and C. These can be matched based on [Ascending Column] IN ([Descending Column] + 1, [Descending Column] - 1). In the second table (odd number of rows), the median row is C, which is matched where [Ascending Column] = [Descending Column]. Note that in the second table, the match criteria &lt;/o:p&gt;for the first table does not apply -- so the generic expression to match either case is the combination of the two:&amp;nbsp; [Ascending Column] IN ([Descending Column], [Descending Column] + 1, [Descending Column] - 1).&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;We can apply this logic within the AdventureWorks database to find the median of the "TotalDue" amount in the Sales.SalesOrderHeader table, for each customer:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;div class="code"&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp; AVG(TotalDue)&lt;br&gt;FROM&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TotalDue,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc&lt;br&gt;&amp;nbsp;&amp;nbsp; FROM Sales.SalesOrderHeader SOH&lt;br&gt;) x&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp; RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)&lt;br&gt;GROUP BY CustomerId&lt;br&gt;ORDER BY CustomerId;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;The equivalent logic using Itzik Ben-Gan's method follows:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;div class="code"&gt;&lt;br&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp; AVG(TotalDue)&lt;br&gt;FROM&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TotalDue,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue) AS RowNum,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*) OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId) AS RowCnt&lt;br&gt;&amp;nbsp;&amp;nbsp; FROM Sales.SalesOrderHeader&lt;br&gt;) x&lt;br&gt;WHERE&lt;br&gt;&amp;nbsp;&amp;nbsp; RowNum IN ((RowCnt + 1) / 2, (RowCnt + 2) / 2)&lt;br&gt;GROUP BY CustomerId&lt;br&gt;ORDER BY CustomerId;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Taking a look at the estimated execution plans for these two queries, we might believe that Ben-Gan's method is superior: Celko's algorithm requires an expensive intermediate sort operation and has an estimated cost of 4.96, compared to 3.96 for Ben-Gan's. &lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Remember that these are merely estimates. And as it turns out, this is one of those times that the Query Optimizer's cost estimates are are totally out of line with the reality of what
happens when you actually run the queries. Although the performance
difference is not especially noticeable on a set of data as small as
that in Sales.SalesOrderHeader, check out the STATISTICS IO output. Celko's version does 703 logical reads; Ben-Gan's does an astonishing 140110!&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;There is a good lesson to be learned from this: &lt;i&gt;Cost-based optimization is far from perfect!&lt;/i&gt; Never completely trust what estimates tell you; they've come a long way, but clearly there is still some work to do in this area. The only way to actually determine that one query is better than another is to run it against a realistic set of data and look at how much IO and CPU time is actually used.&lt;br&gt;&lt;/p&gt;&lt;br&gt;In this case, Ben-Gan's query probably should perform better than Celko's. It seems odd that the Query Processor can't collect the row counts at the same time it processes the row numbers. Regardless, as of today this is the best way to solve this problem... Not that I've ever needed a median in any production application I've worked on. But I suppose that's beside the point!&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=437" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Scalar functions, inlining, and performance: An entertaining title for a boring post</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx</link><pubDate>Fri, 04 Aug 2006 03:07:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:146</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>29</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/146.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=146</wfw:commentRss><description>Scalar.&amp;nbsp; Function.&lt;br&gt;&lt;br&gt;Wow.&lt;br&gt;&lt;br&gt;Could any other combination of words evoke the same feeling of encapsulation, information hiding, and simplification of client code?&amp;nbsp; After years spent developing software in the procedural and OO worlds, it can be difficult--perhaps, even impossible--to migrate over to working with SQL Server and not consider how to architect your data access logic using some of the same techniques you'd use in the application tier.&lt;br&gt;&lt;br&gt;In short: Why would you &lt;i&gt;ever&lt;/i&gt; write the same piece of logic more than once?&amp;nbsp; Answer: &lt;i&gt;You wouldn't (damn it!)&lt;/i&gt;.&amp;nbsp; And so Microsoft bestowed upon the SQL Server community, in SQL Server 2000, the ability to write scalar user-defined functions.&amp;nbsp; And they could have been such beautiful things...&lt;br&gt;&lt;br&gt;But alas, reality can be painful, and as developers tried these new tools they were struck with a strange feeling of sadness as their applications buckled under the weight of what otherwise would have been a wonderful idea. As it turned out, putting all but the simplest of logic into these scalar functions was a recipe for disaster. Why?&amp;nbsp; Because they're essentially cursors waiting to happen (but they don't &lt;i&gt;look &lt;/i&gt;like cursors, so you may not know... until it's too late.)&lt;br&gt;&lt;br&gt;The central problem is that when you wrap logic in a multistatement UDF, the query optimizer just can't unwrap it too easily. And so there's really only one way to evaluate a scalar UDF: call it once per row. And that is really nothing more than a cursor.&lt;br&gt;&lt;br&gt;Seeing this behavior in action is easy enough; consider the following scalar function that some poor sap DBA working for AdventureWorks might be compelled to create:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;CREATE FUNCTION GetMaxProductQty_Scalar&lt;br&gt;(&lt;br&gt;    @ProductId INT&lt;br&gt;)&lt;br&gt;RETURNS INT&lt;br&gt;AS&lt;br&gt;BEGIN&lt;br&gt;    DECLARE @maxQty INT&lt;br&gt;&lt;br&gt;    SELECT @maxQty = MAX(sod.OrderQty)&lt;br&gt;    FROM Sales.SalesOrderDetail sod&lt;br&gt;    WHERE sod.ProductId = @ProductId&lt;br&gt;&lt;br&gt;    RETURN (@maxQty)&lt;br&gt;END&lt;br&gt;&lt;/pre&gt;
&lt;br&gt;Simple enough, right?&amp;nbsp; Let's pretend that AdventureWorks has a bunch of reports, each of which requires maximum quantity sold per product.&amp;nbsp; So the DBA, thinking he can save himself some time and keep everything centralized (and that is a good idea), puts all of the logic into a scalar UDF.&amp;nbsp; Now, when he needs this logic, he can just call the UDF.&amp;nbsp; And if the logic has a bug, or needs to be changed, he can change it in exactly &lt;i&gt;one&lt;/i&gt; place.&amp;nbsp; And so life is great... Or is it?&lt;br&gt;&lt;br&gt;Let's take a look at a sample query:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;SELECT&lt;br&gt;    ProductId,&lt;br&gt;    dbo.GetMaxProductQty_Scalar(ProductId)&lt;br&gt;FROM Production.Product&lt;br&gt;ORDER BY ProductId&lt;br&gt;&lt;/pre&gt;
&lt;br&gt;This query does nothing more than get the max quantity sold for each product in the Productin.Product table. And a look at the execution plan or the STATISTICS IO output might indicate that there's nothing too interesting going on here: The execution plan shows an index scan (to be expected, with no WHERE clause), followed by a compute scalar operation (the call to the UDF). And STATISTICS IO shows a mere 16 reads.&lt;br&gt;&lt;br&gt;So why is this query so problematic? Because the real issue is hiding just beneath the surface.&amp;nbsp;&lt;i&gt; The execution plan and STATISTICS IO didn't consider any of the code evaluated within the UDF!&lt;/i&gt; To see what's &lt;i&gt;really&lt;/i&gt; going on, fire up SQL Server Profiler, turn on the SQL:BatchCompleted event, and make sure you're showing the Reads column. Now run the query again and you'll see that this seemingly-innocent block of T-SQL is, in fact, using 365,247 logical reads. Quite a difference!&lt;br&gt;&lt;br&gt;Each of those "compute scalar" operations is really a call to the UDF, and each of the calls to the UDF is really a new query.&amp;nbsp; And all of those queries (all 504 of them -- the number of products in the Product table) add up to massive I/O.&amp;nbsp; Clearly not a good idea in a production environment.&lt;br&gt;&lt;br&gt;But luckily, we're not done here yet (or this would be a very boring post). Because while the performance penalty is a major turnoff, I really do love the encapsulation afforded by scalar UDFs.&amp;nbsp; I want them (or a similar tool) in my toolbox... And so I got to thinking.&lt;br&gt;&lt;br&gt;The answer to my dilemma, as it turns out, is to not use scalar UDFs at all, but rather to use &lt;i&gt;inline table-valued&lt;/i&gt; UDFs and treat them like scalars. This means that queries get slightly more complex than with scalar UDFs, but because the funtions are inlined (treated like macros) they're optimized along with the rest of the query. Which means, no more under-the-cover cursors.&lt;br&gt;&lt;br&gt;Following is a modified version of the scalar UDF posted above:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;CREATE FUNCTION GetMaxProductQty_Inline&lt;br&gt;(&lt;br&gt;    @ProductId INT&lt;br&gt;)&lt;br&gt;RETURNS TABLE&lt;br&gt;AS&lt;br&gt;    RETURN&lt;br&gt;    (&lt;br&gt;        SELECT MAX(sod.OrderQty) AS maxqty&lt;br&gt;        FROM Sales.SalesOrderDetail sod&lt;br&gt;        WHERE sod.ProductId = @ProductId&lt;br&gt;    )&lt;br&gt;&lt;/pre&gt;&lt;br&gt;This function is no longer actually scalar--in fact, it now returns a table. It just so happens that the table has exactly one column and exactly one row, and uses the same logic as the scalar UDF shown above. So it's still scalar enough for my purposes.&lt;br&gt;&lt;br&gt;The query shown above, used to retrieve the maximum quantity sold for each product, will not quite work with this UDF as-is. Trying to substitute in the new UDF will result in nothing more than a variant on an "object not found" error.&amp;nbsp; Instead, you need actually treat this function like&amp;nbsp; it returns a table (due to the fact that it does).&amp;nbsp; And that means, in this case, a &lt;i&gt;subquery&lt;/i&gt;:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;SELECT&lt;br&gt;    ProductId,&lt;br&gt;    (&lt;br&gt;        SELECT MaxQty&lt;br&gt;        FROM dbo.GetMaxProductQty_Inline(ProductId)&lt;br&gt;    ) MaxQty&lt;br&gt;FROM Production.Product&lt;br&gt;ORDER BY ProductId&lt;br&gt;&lt;/pre&gt;&lt;br&gt;So there it is. We're now treating the table-valued UDF more or less just like a scalar UDF.&amp;nbsp; And the difference in I/O results is really quite astounding: 1267 logical reads in this case. Meaning that the scalar UDF solution is around 288 times more I/O intensive!&lt;br&gt;&lt;br&gt;The question being, of course, was it worth it? The whole thing could have been written as one query, without the need for any UDFs at all. And the final query in this case is quite a bit more complex than the previous version, in addition to the fact that the encapsulation breaks down to some degree by forcing the caller to have some knowledge of how the UDF actually works. But I do feel that this sacrifice is warranted in some cases. Although the "greatest quantity sold" example shown here is simplistic, imagine other situations in which the same code fragments or logic are used over and over, due to lack of a good way of standardizing and centralizing them.&amp;nbsp; I know I've seen that a lot in my work, and some examples I can think of have included complex logic that might very well have been easier to maintain in a UDF.&lt;br&gt;&lt;br&gt;This technique may not be perfect for every case, and it certainly has its tradeoffs. But it may be a useful trick to keep in the back of your mind for a rainy day in the data center when someone's scalar UDF solution starts breaking down and you need a fix that doesn't require a massive code rewrite.&lt;br&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=146" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Running sums, redux</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx</link><pubDate>Thu, 13 Jul 2006 01:51:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:106</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>9</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/106.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=106</wfw:commentRss><description>Siddhartha Gautama, the Buddha, taught us to understand that the key to
enlightenment is following the Middle Path.&amp;nbsp; And today I learned a
valuable lesson in extremes.&amp;nbsp; You can file this one in the "Doh!&amp;nbsp; Wrong again!" category...&lt;br&gt;
&lt;br&gt;
A fairly common question on SQL Server forums is, "how can I get the
running sum of the data in this column?"&amp;nbsp; Being the fan of set-based
queries that I am, I always answer the exact same way.&amp;nbsp; I show the
person asking the question how to do a self-join on the grouped column,
getting all of the "previous" values to create a running sum.&amp;nbsp; The
following example shows how you might do this against the
AdventureWorks Production.TransactionHistory table:&lt;br&gt;
&lt;br&gt;
&lt;blockquote&gt;&lt;font face="Courier New"&gt;SELECT&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; TH1.TransactionID,&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; TH1.ActualCost,&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(TH2.ActualCost) AS RunningTotal&lt;br&gt;
FROM Production.TransactionHistory TH1&lt;br&gt;
JOIN Production.TransactionHistory TH2 ON TH2.TransactionID &amp;lt;= TH1.TransactionID&lt;br&gt;
GROUP BY TH1.TransactionID, TH1.ActualCost&lt;br&gt;
ORDER BY TH1.TransactionID&lt;br&gt;
  &lt;/font&gt;&lt;br&gt;
&lt;/blockquote&gt;
Pretty simple query.&amp;nbsp; For each row of the "TH1" alias, every row with a
lesser-or-equal TransactionID will be summed.&amp;nbsp; Thereby creating a
running total for every row of the table.&amp;nbsp; Note, I've used the IDENTITY
column instead of the date column.&amp;nbsp; I'd generally suggest not doing so
because, e.g., you might need to insert some post-dated rows at some
point and relying on the IDENTITY for a time sequence will thereby not
work.&amp;nbsp; But in this case it's a lazy solution because the
TransactionDate column isn't indexed, and it's also not unique.&amp;nbsp; I want
to test a lot of rows (TransactionHistory has around 113,000), but I
don't want to skew the test by forcing a table scan on every iteration!&lt;br&gt;
&lt;br&gt;
But I digress.&amp;nbsp; The point is, I've given this answer more than a few
times and, well, I'd like to apologize.&amp;nbsp; Just now I went ahead and ran
this query on my powerful test server--err, my laptop.&amp;nbsp; &lt;br&gt;
&lt;br&gt;
As you might guess, since I'm performance-minded I also happen to be
extremely impatient--so I went ahead and killed the query at the
five-minute mark.&amp;nbsp; SSMS's result grid showed the first 26,568 rows, so
obviously there was a long way to go to hit that 113,000 mark.&amp;nbsp; And
with an estimated cost of 38,086 for the query, I can't say I'm
surprised.&lt;br&gt;
&lt;br&gt;
A few moments of head scratching and the following re-write was issued:&lt;br&gt;
&lt;br&gt;
&lt;blockquote&gt;&lt;font face="Courier New"&gt;SELECT &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; TH1.TransactionID,&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; TH1.ActualCost,&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT SUM(TH2.ActualCost) &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Production.TransactionHistory TH2 &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE TH2.TransactionID &amp;lt;= TH1.TransactionID&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS RunningTotal&lt;br&gt;
FROM Production.TransactionHistory TH1&lt;br&gt;
ORDER BY TH1.TransactionID&lt;br&gt;
  &lt;br&gt;
  &lt;/font&gt;&lt;/blockquote&gt;
With an estimated cost of only 6,630, I had high hopes for this one.&amp;nbsp;
Alas, once again I was forced to cancel the query at the five-minute
mark.&amp;nbsp; 27,683 rows.&amp;nbsp; Not much better, I'm afraid.&amp;nbsp; And, as an aside,
I'm starting to wonder about these estimated costs.&amp;nbsp; But that's another
post for another day.&lt;br&gt;
&lt;br&gt;
So where am I going with all of this?&amp;nbsp; Well, there's a reason I haven't
given any indication up until this point in the post.&amp;nbsp; You see, it's &lt;b&gt;utterly painful&lt;/b&gt; to write this, but...&lt;br&gt;
&lt;br&gt;
&lt;i&gt;In this case, a cursor is faster.&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
Yes, I said it.&amp;nbsp; That evil construct which we as database developers despise, the cursor.&amp;nbsp; Thanks to &lt;a href="http://www.sqlserverbible.com/"&gt;Paul Nielsen&lt;/a&gt;,
who revealed this ugly fact to me in a conversation today, I was forced
to test this for myself (hoping to prove him wrong, of course).&amp;nbsp; Which
is why I started playing around with the solution that I've given so
many times on forums.&amp;nbsp; Unfortunately, he is correct.&lt;br&gt;
&lt;br&gt;
My next test query, using the first cursor I've written in several years:&lt;br&gt;
&lt;br&gt;
&lt;blockquote&gt;&lt;font face="Courier New"&gt;DECLARE RunningTotalCursor&lt;br&gt;
CURSOR LOCAL FAST_FORWARD FOR&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TransactionID, ActualCost&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Production.TransactionHistory&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TransactionID&lt;br&gt;
  &lt;br&gt;
OPEN RunningTotalCursor&lt;br&gt;
  &lt;br&gt;
DECLARE @TransactionID INT&lt;br&gt;
DECLARE @ActualCost MONEY&lt;br&gt;
  &lt;br&gt;
DECLARE @RunningTotal MONEY&lt;br&gt;
SET @RunningTotal = 0&lt;br&gt;
  &lt;br&gt;
DECLARE @Results TABLE&lt;br&gt;
(&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; TransactionID INT NOT NULL PRIMARY KEY,&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; ActualCost MONEY,&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; RunningTotal MONEY&lt;br&gt;
)&lt;br&gt;
  &lt;br&gt;
FETCH NEXT FROM RunningTotalCursor&lt;br&gt;
INTO @TransactionID, @ActualCost&lt;br&gt;
  &lt;br&gt;
WHILE @@FETCH_STATUS = 0&lt;br&gt;
BEGIN&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; SET @RunningTotal = @RunningTotal + @ActualCost&lt;br&gt;
  &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; INSERT @Results&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; VALUES (@TransactionID, @ActualCost, @RunningTotal)&lt;br&gt;
  &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; FETCH NEXT FROM RunningTotalCursor&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; INTO @TransactionID, @ActualCost&lt;br&gt;
END&lt;br&gt;
  &lt;br&gt;
CLOSE RunningTotalCursor&lt;br&gt;
  &lt;br&gt;
DEALLOCATE RunningTotalCursor&lt;br&gt;
  &lt;br&gt;
SELECT *&lt;br&gt;
FROM @Results&lt;br&gt;
ORDER BY TransactionID&lt;br&gt;
  &lt;br&gt;
  &lt;/font&gt;&lt;/blockquote&gt;
What's really unfortunate about the cursor approach is that you need to
use a temporary table if you want to return a single result set to the
client. I figured the additional I/O due to the temp table would
balance any improvement gains from the cursor approach, thereby
rendering my forum responses correct, and Paul wrong.&amp;nbsp; Well, 14 seconds
and 113,443 rows later, SSMS and my laptop declared Paul the undisputed
Champion of the Cursor.&lt;br&gt;
&lt;br&gt;
This cursor makes a lot of sense in this case.&amp;nbsp; The set-based query
works by looping over each row of the table, taking a sum of every
previous row.&amp;nbsp; So for the 10th row, 10 previous rows also need to be
visited.&amp;nbsp; For the 1000th row, 1000 previous rows need to be visited.&amp;nbsp;
And so on.&amp;nbsp; The larger the set gets, the worse performance will be--and
that's not going to be a merely linear decrease in performance.&amp;nbsp; Think
about this:&amp;nbsp; Using the set-based method to find the running sum over a
set of 100 rows, 5050 total rows need to be visited.&amp;nbsp; For a set of 200
rows, the query processor needs to visit 20100 total rows -- a
four-fold increase in the amount of work that must be done to satisfy
the query (O((N^2)/2), for those who are a bit more algorithmically
minded.)&lt;br&gt;
&lt;br&gt;
The cursor, on the other hand, needs to visit each row exactly once
(O(N)). By maintaining the running count in a variable, there is no
need to re-visit previous rows.&amp;nbsp; And as my laptop was so happy to show
me, the I/O cost due to the temp table does not overshadow the
performance improvement of having to visit so many less rows.&lt;br&gt;
&lt;br&gt;
So what have we learned today?&amp;nbsp; In my set-based singlemindedness I
failed to realize that the cursor does, indeed, have utility.&amp;nbsp; &lt;b&gt;Everything in moderation.&lt;br&gt;
&lt;br&gt;
&lt;/b&gt;Next steps?&amp;nbsp; I get the feeling that this can be made even faster by
employing a CLR routine.&amp;nbsp; Pull the data into a DataReader and loop over
that instead, which will completely eliminate the need for a temporary
table.&amp;nbsp; Watch for that experiment coming to this space soon.&lt;br&gt;
&lt;br&gt;
And next time you hear someone mention how horrible cursors are, remind
that person that there is a time and place for everything (&lt;a href="http://www.planearium2.de/scripts-204.htm"&gt;and it's called college&lt;/a&gt;).&lt;br&gt;
&lt;br&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=106" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Controlling Stored Procedure Caching with ... Dyanmic SQL?!?</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/controlling-stored-procedure-caching-with-dyanmic-sql.aspx</link><pubDate>Thu, 13 Jul 2006 01:19:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:80</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>3</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/80.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=80</wfw:commentRss><description>&lt;p&gt;Tell me if this situation sends a chill down your spine: You've written
a stored procedure, tested it against a variety of inputs, and finally
rolled it out in production. All is well... Or so you think. You start
getting complaints from some users that it's taking forever to return.
But other users are having no problem. What the..?
&lt;/p&gt;&lt;p&gt;Veteran DBAs will know right away what's going on (even without
reading the title of this post!) -- but for those of you who haven't
had the &lt;i&gt;pleasure&lt;/i&gt;
of debugging these kinds of things, the answer is that cached execution
plans are not always as wonderful for performance as we might like.
&lt;/p&gt;&lt;p&gt;For any given query, there are numerous possible execution
plans that the query optimizer can come up with. Some of them are
optimal, some are less than optimal. But in the end, it's the job of
the query optimizer to decide which one to use (hopefully, the optimal
one). If a stored procedure is executed and its does not have a query
plan in cache, whatever execution plan the optimizer decides to use
will be cached for next time. This is usually a good thing -- it can be
quite a bit of work for the optimizer to make that decision.
&lt;/p&gt;&lt;p&gt;But in some cases, this is where the trouble begins. One of the
main factors the optimizer uses is index statistics vs. what parameters
are being used for the query. This can greatly affect what the
'correct' execution plan is -- the optimizer must decide such things as
which index should be used, whether a seek or a scan should be
performed, what types of joins are most efficient, etc. But as
parameters change, so can the most appropriate choices.
&lt;/p&gt;&lt;p&gt;
To illustrate this better, some sample data will be useful.  Break out your &lt;a href="http://sqljunkies.com/WebLog/amachanic/articles/NumbersTable.aspx" target="#"&gt;numbers table&lt;/a&gt; and run the following script, which will create a table with three columns, around 20 million rows, and a couple of indexes...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT &lt;br&gt;	Number, &lt;br&gt;	DATEADD(ss, Number, 0) AS TheDate&lt;br&gt;INTO DateTbl&lt;br&gt;FROM Numbers&lt;br&gt;&lt;br&gt;DECLARE &lt;br&gt;	@Num INT,&lt;br&gt;	@Incr INT&lt;br&gt;&lt;br&gt;SELECT &lt;br&gt;	@Num = MAX(Number) + 1,&lt;br&gt;	@Incr = MAX(Number) + 1&lt;br&gt;FROM DateTbl&lt;br&gt;&lt;br&gt;WHILE @Num &amp;lt; 20000000&lt;br&gt;BEGIN&lt;br&gt;	INSERT DateTbl (Number, TheDate)&lt;br&gt;	SELECT Number + @Num, DATEADD(ss, Number + @Num, 0) AS TheDate&lt;br&gt;	FROM Numbers&lt;br&gt;&lt;br&gt;	SET @Num = @Num + @Incr&lt;br&gt;END&lt;br&gt;&lt;br&gt;&lt;br&gt;CREATE UNIQUE CLUSTERED INDEX IX_Date ON DateTbl(TheDate)&lt;br&gt;&lt;br&gt;CREATE UNIQUE NONCLUSTERED INDEX IX_Number ON DateTbl(Number)&lt;br&gt;&lt;br&gt;ALTER TABLE DateTbl &lt;br&gt;ADD AnotherCol VARCHAR(40) NULL&lt;br&gt;&lt;br&gt;UPDATE DateTbl&lt;br&gt;SET AnotherCol = CONVERT(VARCHAR, Number) + CONVERT(VARCHAR, TheDate)&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Okay! Now that your hard drive's workout is done, let's take a look
at what we have... DateTbl has three columns: A sequential number, a
datetime column, and a character column. You should have one row for
every second between January 1, 1900 and sometime around August 21,
1900, depending on how big your numbers table is. The date column and
the number column are indexed (we'll be using those as predicates in
the WHERE clause of the example queries), but the character column is
not. That's on purpose, to force a bookmark lookup. What can I say --
this is a totally contrived example!
&lt;/p&gt;&lt;p&gt;
Put Query Analyzer into Show Execution Plan mode and check out the following:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT &lt;br&gt;	Number,&lt;br&gt;	TheDate,&lt;br&gt;	AnotherCol&lt;br&gt;FROM DateTbl&lt;br&gt;WHERE TheDate BETWEEN '19000201 09:35:00' AND '19000201 09:36:00'&lt;br&gt;	OR Number = 10&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Before I proceed, I would just like to say that anyone who comments
or e-mails me saying that this query can be re-written with a UNION to
get consistently better execution plans will be slapped upside the head
with a trout. YES, this is a bad query, but as I said, this is a very
simple example. In real life, these situations are usually much more
difficult to re-write. So if you don't like my example, go write your
own article!
&lt;/p&gt;&lt;p&gt;
... Now that that's taken care of ...
&lt;/p&gt;&lt;p&gt;The execution plan you should see will have a seek on each
index. Makes sense -- we're looking at a very small chunk of data in
each place. But what if we change the query to use a much larger date
range?
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT &lt;br&gt;	Number,&lt;br&gt;	TheDate,&lt;br&gt;	AnotherCol&lt;br&gt;FROM DateTbl&lt;br&gt;WHERE TheDate BETWEEN '19000101' AND '19000201'&lt;br&gt;	OR Number = 10&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;A seek on the date index no longer makes sense. A range scan is a
better option. And why bother seeking on the Number column? The row
with the number 10 is already found within the selected range. SQL
Server agrees with me on this, and performs only a scan of the
clustered date index.
&lt;/p&gt;&lt;p&gt;
But now let's see what happens when we throw this into a stored procedure.  Create the following:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;CREATE PROCEDURE GetStuff&lt;br&gt;	@StartDate DateTime,&lt;br&gt;	@EndDate DateTime,&lt;br&gt;	@Number INT&lt;br&gt;AS&lt;br&gt;	SELECT &lt;br&gt;		Number,&lt;br&gt;		TheDate,&lt;br&gt;		AnotherCol&lt;br&gt;	FROM DateTbl&lt;br&gt;	WHERE TheDate BETWEEN @StartDate AND @EndDate&lt;br&gt;		OR Number = @Number&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
... And run the following in Query Analyzer with Show Execution Plan turned on ...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;EXEC GetStuff '19000201 09:35:00', '19000201 09:36:00', 10&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Same execution plan as before!  That's great, right?  Well...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;EXEC GetStuff '19000101', '19000201', 10&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
... ... ...
&lt;/p&gt;&lt;p&gt;
This is taking a while ...
&lt;/p&gt;&lt;p&gt;
Enjoy the break?  Good, now get back to work!  
&lt;/p&gt;&lt;p&gt;Check out the execution plan. I guess the cached one wasn't
optimal for the second query. So how do we satisfy BOTH sets of
arguments?
&lt;/p&gt;&lt;p&gt;
One way is to force the stored procedure to recompile each time:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;ALTER PROCEDURE GetStuff&lt;br&gt;	@StartDate DateTime,&lt;br&gt;	@EndDate DateTime,&lt;br&gt;	@Number INT&lt;br&gt;WITH RECOMPILE&lt;br&gt;AS&lt;br&gt;	SELECT &lt;br&gt;		Number,&lt;br&gt;		TheDate,&lt;br&gt;		AnotherCol&lt;br&gt;	FROM DateTbl&lt;br&gt;	WHERE TheDate BETWEEN @StartDate AND @EndDate&lt;br&gt;		OR Number = @Number&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
You'll notice that I added &lt;b&gt;WITH RECOMPILE&lt;/b&gt;. And while that's
probably not a big deal for this example stored procedure, it isn't a
good idea for the types of really complex stored procedures where these
problems crop up in the real world. Recompilation can be quite
intensive, and I really don't want it happening every time an active
stored procedure is called.
&lt;/p&gt;&lt;p&gt;But you already knew that wasn't the solution, because in
elementary school you were taught how to read, and the title of this
article isn't "Controlling Stored Procedure Caching with ... WITH
RECOMPILE".
&lt;/p&gt;&lt;p&gt;
No, instead the title is, "Controlling Stored Procedure Caching with ... Dyanmic SQL?!?"
&lt;/p&gt;&lt;p&gt;
Yes, dynamic SQL.  If you don't know about dynamic SQL, go read &lt;a href="http://www.sommarskog.se/dynamic_sql.html" target="#"&gt;this article&lt;/a&gt; right now and come back when you're finished.
&lt;/p&gt;&lt;p&gt;
You may have heard about a system stored procedure called &lt;b&gt;sp_executesql&lt;/b&gt;.
It lets you evaluate dynamic SQL, but it happens to also cache its
execution plan. In addition, due to the fact that it accepts
parameters, it makes SQL injection nearly impossible if correctly used.
So it's good stuff. We could evaluate our test query using
sp_executesql like this:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;DECLARE @SQL NVARCHAR(300)&lt;br&gt;&lt;br&gt;SET @SQL = '' +&lt;br&gt;	'SELECT ' +&lt;br&gt;		'Number, ' +&lt;br&gt;		'TheDate, ' +&lt;br&gt;		'AnotherCol ' +&lt;br&gt;	'FROM DateTbl ' +&lt;br&gt;	'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +&lt;br&gt;		'OR Number = @Number'&lt;br&gt;&lt;br&gt;EXEC sp_executesql &lt;br&gt;	@SQL, &lt;br&gt;	N'@StartDate DATETIME, @EndDate DATETIME, @Number INT', &lt;br&gt;	@StartDate = '19000201 09:35:00',&lt;br&gt;	@EndDate = '19000201 09:36:00', &lt;br&gt;	@Number = 10&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
... And that's just wonderful, but it gives us &lt;i&gt;absolutely nothing&lt;/i&gt;, because if you re-run it with the other parameters you'll find that you have the same problem as the stored procedure version:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;DECLARE @SQL NVARCHAR(300)&lt;br&gt;&lt;br&gt;SET @SQL = '' +&lt;br&gt;	'SELECT ' +&lt;br&gt;		'Number, ' +&lt;br&gt;		'TheDate, ' +&lt;br&gt;		'AnotherCol ' +&lt;br&gt;	'FROM DateTbl ' +&lt;br&gt;	'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +&lt;br&gt;		'OR Number = @Number'&lt;br&gt;&lt;br&gt;EXEC sp_executesql &lt;br&gt;	@SQL, &lt;br&gt;	N'@StartDate DATETIME, @EndDate DATETIME, @Number INT', &lt;br&gt;	@StartDate = '19000101',&lt;br&gt;	@EndDate = '19000201', &lt;br&gt;	@Number = 10&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Time for another coffee break...
&lt;/p&gt;&lt;p&gt;But let us not lose hope yet, because we're still in the article
that's talking about how to control caching and recompilation and you
know that I wouldn't have written this article if I didn't know the
answer.
&lt;/p&gt;&lt;p&gt;
So what's &lt;i&gt;really&lt;/i&gt; being cached here?  Let's take a look:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT sql&lt;br&gt;FROM master..syscacheobjects&lt;br&gt;WHERE sql LIKE '%datetbl%'&lt;br&gt;	AND cacheobjtype = 'Executable Plan'&lt;br&gt;&lt;br&gt;-----------------------------------------------------&lt;br&gt;&lt;br&gt;(@StartDate DATETIME, @EndDate DATETIME, @Number INT)&lt;br&gt;SELECT &lt;br&gt;	Number, &lt;br&gt;	TheDate, &lt;br&gt;	AnotherCol &lt;br&gt;FROM DateTbl &lt;br&gt;WHERE TheDate BETWEEN @StartDate &lt;br&gt;	AND @EndDate OR Number = @Number&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;The cached plan is cached for not just the query, but also a
parameter list -- and not just any parameter list, but the very
parameter list that was passed in to sp_executesql. So how could we
force SQL Server to cache a different plan for the same query?
&lt;/p&gt;&lt;p&gt;
... Change the parameter list!
&lt;/p&gt;&lt;p&gt;The parameter list, of course, is correlated to the actual
parameters passed in. But what you may not realize is that if you
satisfy a parameter within the list, sp_executesql will not expect a
correlated parameter to be passed in. For instance, the following is
perfectly valid:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;DECLARE @SQL NVARCHAR(300)&lt;br&gt;&lt;br&gt;SET @SQL = '' +&lt;br&gt;	'SELECT @TheParam AS TheParam'&lt;br&gt;&lt;br&gt;EXEC sp_executesql &lt;br&gt;	@SQL, &lt;br&gt;	N'@TheParam VARCHAR(100) = ''This is the param'''&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Not only that, but it's been cached:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT sql&lt;br&gt;FROM master..syscacheobjects&lt;br&gt;WHERE sql LIKE '%param%'&lt;br&gt;	AND cacheobjtype = 'Executable Plan'&lt;br&gt;&lt;br&gt;-----------------------------------------------------&lt;br&gt;&lt;br&gt;(@TheParam VARCHAR(100) = 'This is the param')&lt;br&gt;SELECT @TheParam AS TheParam&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
So what happens if we change our parameter's value?
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;DECLARE @SQL NVARCHAR(300)&lt;br&gt;&lt;br&gt;SET @SQL = '' +&lt;br&gt;	'SELECT @TheParam AS TheParam'&lt;br&gt;&lt;br&gt;EXEC sp_executesql &lt;br&gt;	@SQL, &lt;br&gt;	N'@TheParam VARCHAR(100) = ''This is the other_param'''&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Same query, but...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT sql&lt;br&gt;FROM master..syscacheobjects&lt;br&gt;WHERE sql LIKE '%other_param%'&lt;br&gt;	AND cacheobjtype = 'Executable Plan'&lt;br&gt;&lt;br&gt;-----------------------------------------------------&lt;br&gt;&lt;br&gt;(@TheParam VARCHAR(100) = 'This is the other_param')&lt;br&gt;SELECT @TheParam AS TheParam&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Yes, a second cached execution plan! Exciting, isn't it? Kind of
like winning the lottery, only even better, because you don't have to
worry about how to spend all of that extra cash!
&lt;/p&gt;&lt;p&gt;So how do we put this all together? A quick recap: We know that
the query requires at least two execution plans; one for big date
ranges, and one for smaller date ranges. There might be more, but we
haven't tested that, so I'll leave it as an exercise for the reader. We
also know that sp_executesql will cache a second, third, or &lt;i&gt;Nth&lt;/i&gt;
execution plan whenever the parameter list is changed. So all we need
to do is change the parameter list depending on the inputs...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;ALTER PROC GetStuff&lt;br&gt;	@StartDate DATETIME,&lt;br&gt;	@EndDate DATETIME,&lt;br&gt;	@Number INT&lt;br&gt;AS&lt;br&gt;	DECLARE @SQL NVARCHAR(300)&lt;br&gt;	DECLARE @Params NVARCHAR(100)&lt;br&gt;&lt;br&gt;	SET @SQL = '' +&lt;br&gt;		'SELECT ' +&lt;br&gt;			'Number, ' +&lt;br&gt;			'TheDate, ' +&lt;br&gt;			'AnotherCol ' +&lt;br&gt;		'FROM DateTbl ' +&lt;br&gt;		'WHERE TheDate BETWEEN @StartDate AND @EndDate ' +&lt;br&gt;			'OR Number = @Number'&lt;br&gt;&lt;br&gt;	IF DATEDIFF(hh, @StartDate, @EndDate) &amp;lt;= 2&lt;br&gt;		SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 1'&lt;br&gt;	ELSE&lt;br&gt;		SET @Params = '@StartDate DATETIME, @EndDate DATETIME, @Number INT, @dX1 INT = 2'&lt;br&gt;&lt;br&gt;	EXEC sp_executesql &lt;br&gt;		@SQL, &lt;br&gt;		@Params, &lt;br&gt;		@StartDate, &lt;br&gt;		@EndDate, &lt;br&gt;		@Number&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Pretending that we've actually tested for the correct thresholds
(which you should do if you use this technique), you'll notice that
we're forcing a different execution plan if the time between start date
and end date is less than or equal to two hours (that will be an index
seek) or more than two hours (that will be an index scan).
&lt;/p&gt;&lt;p&gt;Since forcing evaluation of a new execution plan in this case
is simply a matter of changing the value of @dX1, you can add as many
conditions as necessary to control which cached plan is used for any
given set of arguments. Two hours is almost certainly not the best
choice here, but really, does it matter?
&lt;/p&gt;&lt;p&gt;So in conclusion, blah, blah, blah... No one reads this far,
you stopped after you saw the final stored procedure, didn't you? Have
a nice day, and enjoy your new, more dynamic stored procedures.&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=80" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Optimization/default.aspx">Optimization</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Performance: ISNULL vs. COALESCE</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/performance-isnull-vs-coalesce.aspx</link><pubDate>Thu, 13 Jul 2006 01:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:78</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>17</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/78.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=78</wfw:commentRss><description>&lt;p&gt;&lt;a href="http://weblogs.sqlteam.com/mladenp" target="#"&gt;Mladen aka spirit1&lt;/a&gt; posted a &lt;a href="http://weblogs.sqlteam.com/mladenp/articles/2937.aspx" target="#"&gt;speed test&lt;/a&gt; of COALESCE vs. ISNULL.  Reported result:  COALESCE is faster.
&lt;/p&gt;&lt;p&gt;
But leave it to &lt;a href="http://blogs.x2line.com/al/" target="#"&gt;Anatoly Lubarsky&lt;/a&gt; to argue with what was posted.  He &lt;a href="http://blogs.x2line.com/al/archive/2004/03/01/189.aspx" target="#"&gt;posted&lt;/a&gt; his own speed test, showing that ISNULL is faster.
&lt;/p&gt;&lt;p&gt;
Anatoly's results showed a miniscule difference, "52 seconds" vs.
"52-53 seconds". Mlanden's tests show a larger difference, around 15%.
But I don't trust either of these results. &lt;/p&gt;&lt;p&gt;
One thing in common with both of the tests I linked to, and which makes
them both flawed, is that they return data to the client. This factors
greatly into testing time. What if there was a network hiccup, or what
if the client UI did something different when rendering the results?
We're not testing the network's ability to send data or the client's
ability to render it. What's being tested is very specific: Speed of
COALESCE vs. ISNULL.
&lt;/p&gt;&lt;p&gt;
So this leads me to present &lt;b&gt;Adam's Number 1 Rule of Performance Testing:&lt;/b&gt; When performance testing a specific feature, do everything in your power to test &lt;i&gt;only&lt;/i&gt;
that feature itself. Isolate your test as much as possible so that
there is no way network traffic or unrelated UI code will get in the
way. If you aren't careful about this, you will end up testing these
other resources instead of your goal. And when testing against tables
in SQL Server, it's especially important to be careful given SQL
Server's caching mechanisms. So when testing using tables, I'll always
throw out the first few test runs, or even restart the server between
tests, in order to control the cache in whever way is logical for the
feature being tested.
&lt;/p&gt;&lt;p&gt;Before getting to my own tests, I'd like to jump off on a quick
tanget. COALESCE vs. ISNULL? Who cares! This isn't a performance
question, this is a question of standards-conformant vs. proprietary
code. ISNULL is non-standard and provides less functionality than
COALESCE. Yet a lot of SQL Server developers love to use it, I suspect
because it's a lot easier to remember (and spell). So learn a new word
and type two extra characters and you'll end up with more maintainable,
more functional code. Sounds good to me -- which is why I am a big fan
of COALESCE.
&lt;/p&gt;&lt;p&gt;
But I am still curious... Which is faster?
&lt;/p&gt;&lt;p&gt;In this case, no test data is needed. We're testing performance
of the COALESCE and ISNULL functions themselves, not using them to
access data from a table. So the most effective test, in my opinion, is
to run COALESCE and ISNULL a bunch of times each (one million) and see
which runs faster:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;DECLARE @i INT SET @i = 1&lt;br&gt;DECLARE @CPU INT SET @CPU = @@CPU_BUSY&lt;br&gt;DECLARE @StartDate DATETIME SET @StartDate = GETDATE()&lt;br&gt;&lt;br&gt;WHILE @i &amp;lt;= 1000000&lt;br&gt;BEGIN&lt;br&gt;	IF COALESCE('abc', 'def') = 'def'&lt;br&gt;		PRINT 1&lt;br&gt;	SET @i = @i + 1&lt;br&gt;END&lt;br&gt;&lt;br&gt;PRINT 'COALESCE, both non-null'&lt;br&gt;PRINT 'Total CPU time: ' + CONVERT(varchar, @@CPU_BUSY - @CPU)&lt;br&gt;PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))&lt;br&gt;PRINT ''&lt;br&gt;GO&lt;br&gt;&lt;br&gt;DECLARE @i INT SET @i = 1&lt;br&gt;DECLARE @CPU INT SET @CPU = @@CPU_BUSY&lt;br&gt;DECLARE @StartDate DATETIME SET @StartDate = GETDATE()&lt;br&gt;&lt;br&gt;WHILE @i &amp;lt;= 1000000&lt;br&gt;BEGIN&lt;br&gt;	IF ISNULL('abc', 'def') = 'def'&lt;br&gt;		PRINT 1&lt;br&gt;	SET @i = @i + 1&lt;br&gt;END&lt;br&gt;&lt;br&gt;PRINT 'ISNULL, both non-null'&lt;br&gt;PRINT 'Total CPU time: ' + CONVERT(varchar, @@CPU_BUSY - @CPU)&lt;br&gt;PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))&lt;br&gt;PRINT ''&lt;br&gt;GO&lt;br&gt;&lt;br&gt;DECLARE @i INT SET @i = 1&lt;br&gt;DECLARE @CPU INT SET @CPU = @@CPU_BUSY&lt;br&gt;DECLARE @StartDate DATETIME SET @StartDate = GETDATE()&lt;br&gt;&lt;br&gt;WHILE @i &amp;lt;= 1000000&lt;br&gt;BEGIN&lt;br&gt;	IF COALESCE(null, 'abc') = 'def'&lt;br&gt;		PRINT 1&lt;br&gt;	SET @i = @i + 1&lt;br&gt;END&lt;br&gt;&lt;br&gt;PRINT 'COALESCE, first column null'&lt;br&gt;PRINT 'Total CPU time: ' + CONVERT(varchar, @@CPU_BUSY - @CPU)&lt;br&gt;PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))&lt;br&gt;PRINT ''&lt;br&gt;GO&lt;br&gt;&lt;br&gt;DECLARE @i INT SET @i = 1&lt;br&gt;DECLARE @CPU INT SET @CPU = @@CPU_BUSY&lt;br&gt;DECLARE @StartDate DATETIME SET @StartDate = GETDATE()&lt;br&gt;&lt;br&gt;WHILE @i &amp;lt;= 1000000&lt;br&gt;BEGIN&lt;br&gt;	IF COALESCE(null, 'abc') = 'def'&lt;br&gt;		PRINT 1&lt;br&gt;	SET @i = @i + 1&lt;br&gt;END&lt;br&gt;&lt;br&gt;PRINT 'ISNULL, first column null'&lt;br&gt;PRINT 'Total CPU time: ' + CONVERT(varchar, @@CPU_BUSY - @CPU)&lt;br&gt;PRINT 'Total milliseconds: ' + CONVERT(varchar, DATEDIFF(ms, @StartDate, GETDATE()))&lt;br&gt;PRINT ''&lt;br&gt;GO&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;You'll notice that I'm not using STATISTICS TIME to get the CPU and
run time. Unfortunately, STATSTICS TIME returns once per statement, so
it is not usable for this test -- we would wind up with one million 0
millisecond results. If you're running on a quiet server (and you
should always run targeted performance tests on a quiet server; that
may have to become Adam's Number 2 Rule if I can't think of something
better) @@CPU_BUSY will give a close enough approximation of how much
CPU time the test is using. And DATEDIFF will give us a good enough
time reading. Note that the predicate in the IF statement will never
return true, so we know that we're not testing our network or client.
&lt;/p&gt;&lt;p&gt;I ran these tests several times on a few different servers, and
ISNULL appears to pretty consistently out-perform COALESCE by an
average of 10 or 12 percent. But that's the difference between 6
seconds and 5.3 seconds (the approximate average runtimes per test on
my servers), over the course of a million exections. Hardly worth the
functionality and standards compliance sacrifice, at least in the
scenarios I use these functions for.&lt;/p&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=78" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Rowset string concatenation: Which method is best?</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/rowset-string-concatenation-which-method-is-best.aspx</link><pubDate>Thu, 13 Jul 2006 01:15:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:77</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>25</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/77.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=77</wfw:commentRss><description>&lt;p&gt;Yeah, yeah, yeah, let's get this out of the way right from the start:
Don't concatenate rows into delimited strings in SQL Server. Do it
client side.
&lt;/p&gt;&lt;p&gt;
Except if you really have to create delimited strings in SQL Server.  In which case you should read on.
&lt;/p&gt;&lt;p&gt;
There was a little &lt;a href="http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=42415" target="#"&gt;discussion&lt;/a&gt; on &lt;a href="http://www.sqlteam.com/" target="#"&gt;SQLTeam&lt;/a&gt;
about the best way to concatenate. I recommended a scalar UDF solution,
whereas Rob Volk recommended a solution involving a temp table.
&lt;/p&gt;&lt;p&gt;I mentioned my dislike for the temp table solution for a couple
of reasons. First of all, it relies on a clustered index for ordering.
That will probably work in this example, but is not guaranteed to
always work and relying on indexes rather than ORDER BY for ordering is
definitely not a habit I want anyone to get into. The clustered index
as it was described in Rob's example also has another problem that I
didn't even notice until I was writing this entry. But I'll get to that
in a moment. The second reason I dislike the temp table is that I felt
it would be less efficient than the scalar UDF.
&lt;/p&gt;&lt;p&gt;
Rob didn't agree about the efficiency.  And so I set out to prove him wrong...
&lt;/p&gt;&lt;p&gt;
We'll use the Authors table in Pubs.  I want a comma-delimited list, per state, of the last name of each author who lives there.
&lt;/p&gt;&lt;p&gt;
First, the scalar UDF:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;USE pubs&lt;br&gt;GO&lt;br&gt;&lt;br&gt;CREATE FUNCTION dbo.ConcatAuthors(@State CHAR(2))&lt;br&gt;RETURNS VARCHAR(8000)&lt;br&gt;AS&lt;br&gt;BEGIN&lt;br&gt;	DECLARE @Output VARCHAR(8000)&lt;br&gt;	SET @Output = ''&lt;br&gt;&lt;br&gt;	SELECT @Output =	CASE @Output &lt;br&gt;				WHEN '' THEN au_lname &lt;br&gt;				ELSE @Output + ', ' + au_lname &lt;br&gt;				END&lt;br&gt;	FROM Authors&lt;br&gt;	WHERE State = @State&lt;br&gt;	ORDER BY au_lname&lt;br&gt;&lt;br&gt;	RETURN @Output&lt;br&gt;END&lt;br&gt;GO&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
To find the list I want:
&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT State, dbo.ConcatAuthors(State)&lt;br&gt;FROM Authors&lt;br&gt;ORDER BY State&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;... And the adaptation of Rob's temp table method... I did change
two things due to problems I discovered during testing. One, I've
altered the au_lname column to VARCHAR(8000); the column in the Authors
table is VARCHAR(40), not large enough for all of the California
authors. What if we were dealing with a much larger dataset? Second, I
added an IDENTITY column, and I'm clustering on that instead of the
actual data to get the ordering. I'm doing so because of the
VARCHAR(8000). Index rows can be a maximum of 900 bytes, so if we had
enough data to exceed that length, this method would fail.
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;CREATE TABLE #AuthorConcat&lt;br&gt;(&lt;br&gt;	State CHAR(2) NOT NULL,&lt;br&gt;	au_lname VARCHAR(8000) NOT NULL,&lt;br&gt;	Ident INT IDENTITY(1,1) NOT NULL PRIMARY KEY&lt;br&gt;)&lt;br&gt;&lt;br&gt;INSERT #AuthorConcat &lt;br&gt;(&lt;br&gt;	State,&lt;br&gt;	au_lname&lt;br&gt;)&lt;br&gt;SELECT&lt;br&gt;	State, &lt;br&gt;	au_lname&lt;br&gt;FROM Authors&lt;br&gt;ORDER BY &lt;br&gt;	State, &lt;br&gt;	au_lname&lt;br&gt;&lt;br&gt;DECLARE @Authors VARCHAR(8000)&lt;br&gt;SET @Authors = ''&lt;br&gt;DECLARE @State CHAR(2)&lt;br&gt;SET @State = ''&lt;br&gt;&lt;br&gt;UPDATE #AuthorConcat&lt;br&gt;SET @Authors = au_lname =	CASE &lt;br&gt;				WHEN @State = State THEN @Authors + ', ' + au_lname &lt;br&gt;				ELSE au_lname END,&lt;br&gt;	@State = State&lt;br&gt;&lt;br&gt;SELECT State, MAX(au_lname) &lt;br&gt;FROM #AuthorConcat&lt;br&gt;GROUP BY State&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Clever, but more complex and harder to read than the scalar UDF
version. Output is identical, but that's not why we're here. Which one
is more efficient?
&lt;/p&gt;&lt;p&gt;
Drumroll, please...
&lt;/p&gt;&lt;p&gt;Results were tabulated using STATISTICS IO, STATISTICS TIME, and
Query Analyzer's Show Execution Plan. DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE were run before each test.
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Scalar UDF Method&lt;/b&gt;
&lt;br&gt;
Total cost: 0.0492
&lt;br&gt;
Total Scan count: 1
&lt;br&gt;
Total Logical reads: 2
&lt;br&gt;
Total Physical reads: 2
&lt;br&gt;
Total time: 25 ms
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Temp Table Method&lt;/b&gt;
&lt;br&gt;
Total cost: 0.2131
&lt;br&gt;
Total Scan count: 4
&lt;br&gt;
Total Logical reads: 9
&lt;br&gt;
Total Physical reads: 2
&lt;br&gt;
Total time: 88 ms
&lt;/p&gt;&lt;p&gt;So in conclusion, neither method is incredibly taxing with the
tiny Pubs dataset, but I think I have proven that the UDF is far more
efficient.
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;hr&gt;Update, February 28, 2005: Modified the adapation of Rob Volk's
method to use a CREATE TABLE instead of SELECT INTO, as the latter is
not necessarily guaranteed to insert rows in the right order for the
sake of this example. Thanks to "PW" on SQLServerCentral for pointing
this problem out. Note that this changed the total costs very slightly
-- for the better -- but the UDF still performs better by quite a large
margin.&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=77" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx">Architecture</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Is PATINDEX faster than LIKE?</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/is-patindex-faster-than-like.aspx</link><pubDate>Thu, 13 Jul 2006 01:13:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:76</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>5</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/76.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=76</wfw:commentRss><description>I keep seeing the same suggestion on various "tips and tricks"
websites: For situations in which you might want to use LIKE in the
WHERE clause, but for which indexes cannot be used, PATINDEX will
perform faster.
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;
So, &lt;i&gt;according to these sources&lt;/i&gt;, this:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT *&lt;br&gt;FROM tbl&lt;br&gt;WHERE PATINDEX('%abc%', col) &amp;gt; 0&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
is faster than this:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT *&lt;br&gt;FROM tbl&lt;br&gt;WHERE col LIKE '%abc%'&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;The thing is, I'm not one to just take this kind of stuff at face
value, so I've tested this assertion several times. Every time I see
this tip, I think, "I must be missing something," and I test again. And
every single time I test again, with different data, different
patterns, etc, I arrive at the same conclusion: They perform &lt;i&gt;exactly the same&lt;/i&gt;.
&lt;/p&gt;&lt;p&gt;
Which brings us to today.  In the SQL Server Central forums, William O'Malley &lt;a href="http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=65&amp;amp;messageid=144992" target="#"&gt;told me&lt;/a&gt; that in his tests, on his data that he can't post due to his industry (?), PATINDEX does outperform LIKE.
&lt;/p&gt;&lt;p&gt;
So I decided to test yet again. And I'm still coming up with the exact
same numbers. I'm hoping that some reader will be able to tell me this
mysterious circumstance in which PATINDEX really does outperform LIKE.
Or maybe explain why my test is totally incorrect. &lt;/p&gt;&lt;p&gt;
Anyway, here's what I did today... First, I created a big table of test
data (83 million rows) with the following (which you may notice that I
lifted from a previous post):
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT A.Name + B.Name + C.Name AS SomewhatLargeString&lt;br&gt;INTO #BigTableOfStrings&lt;br&gt;FROM	master..spt_values A,&lt;br&gt;	master..spt_values B,&lt;br&gt;	master..spt_values C&lt;br&gt;WHERE	a.TYPE NOT IN ('P', 'R', 'F', 'F_U')&lt;br&gt;	AND b.TYPE NOT IN  ('P', 'R', 'F', 'F_U')&lt;br&gt;&lt;br&gt;&lt;br&gt;CREATE CLUSTERED INDEX CI_LargeString ON #BigTableOfStrings(SomewhatLargeString)&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
I decided to test against the pattern '%ossDbOwnChainRefere%', for which there are 1752 rows in the test table.
&lt;/p&gt;&lt;p&gt;
First, I ran the LIKE query:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT COUNT(*)&lt;br&gt;FROM #BigTableOfStrings&lt;br&gt;WHERE SomewhatLargeString LIKE '%ossDbOwnChainRefere%'&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Runtime: 9:55. 
&lt;/p&gt;&lt;p&gt;
Then I tried PATINDEX:
&lt;/p&gt;&lt;pre class="code"&gt;SELECT COUNT(*)&lt;br&gt;FROM #BigTableOfStrings&lt;br&gt;WHERE PATINDEX('%ossDbOwnChainRefere%', SomewhatLargeString) &amp;gt; 0&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Runtime: 9:56 (yes, worse)
&lt;/p&gt;&lt;p&gt;Then I ran LIKE again with a runtime of 9:47, then PATINDEX
again with a runtime of 9:50, and now I'm not patient enough to run
either of them again.
&lt;/p&gt;&lt;p&gt;
So am I correct?  Is this claim bogus?  Or have I gone completely off-base?&lt;/p&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=76" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Paging in SQL Server 2005</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/paging-in-sql-server-2005.aspx</link><pubDate>Thu, 13 Jul 2006 01:12:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:75</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>3</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/75.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=75</wfw:commentRss><description>I keep seeing questions on newsgroups about paging in stored
procedures, and whether there will be a better way in SQL Server 2005.
However, aside from a few answers in newsgroups, I haven't seen any
content on how to do it. So I'd like to spend a few minutes and share
with you the new features that will make paging stored procedures both
easier to build and a lot more performant...
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;
But first, since this is a performance-related blog, let's generate a bunch of big test data!
&lt;/p&gt;&lt;p&gt;Since I don't have AdventureWorks installed on my test SQL
Server 2005 server at the moment and am too lazy to track it down, the
data is somewhat ugly... Anyway, start this up and let it run for a
while (takes around 35 minutes on my test box):
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT A.Name + B.Name + C.Name AS SomewhatLargeString&lt;br&gt;INTO #BigTableOfStrings&lt;br&gt;FROM	master..spt_values A,&lt;br&gt;	master..spt_values B,&lt;br&gt;	master..spt_values C&lt;br&gt;WHERE	a.TYPE NOT IN ('P', 'R', 'F', 'F_U')&lt;br&gt;	AND b.TYPE NOT IN  ('P', 'R', 'F', 'F_U')&lt;br&gt;&lt;br&gt;CREATE CLUSTERED INDEX CI_LargeString ON #BigTableOfStrings(SomewhatLargeString)&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Coffee (or preferably, beer) time!  See you in 35 minutes.
&lt;/p&gt;&lt;p&gt;
... All set?
&lt;/p&gt;&lt;p&gt;Okay, now let's pretend we're in a web app with a SQL Server
2000 backend and we want the second page of data... Rows 11-20, ordered
by SomewhatLarge. How might we do this in SQL Server 2000..? Here's one
way:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT A.SomewhatLargeString&lt;br&gt;FROM #BigTableOfStrings A&lt;br&gt;JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;GROUP BY A.SomewhatLargeString&lt;br&gt;HAVING COUNT(*) BETWEEN 11 AND 20&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;... Still waiting for that to return? Keep waiting. Maybe get
another coffee, or go home for the night. On my system, that query has
quite possibly the biggest estimated cost I've ever seen, a staggering
1,523,110,700. Yeah, that sucks. So let's change it around a bit:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20 A.SomewhatLargeString, COUNT(*) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;	GROUP BY A.SomewhatLargeString&lt;br&gt;	ORDER BY A.SomewhatLargeString ) x&lt;br&gt;WHERE x.TheCount BETWEEN 11 AND 20&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Much nicer than before, with an estimated cost on my system of 23.1,
and a virtually instant return time. But wait, we have an over-zealous
user who wants rows 20,001 - 20,010!
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20010 A.SomewhatLargeString, COUNT(*) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;	GROUP BY A.SomewhatLargeString&lt;br&gt;	ORDER BY A.SomewhatLargeString ) x&lt;br&gt;WHERE x.TheCount BETWEEN 20001 AND 20010&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Oops, cost skyrocketed to 50516, with a return time of 1:30... Can
you feel your users abandoning your sinking ship of an app and heading
towards the competitors? Probably not, since they won't actually click
through 20,000 rows, but it makes a really good contrived example, so
let's roll with it!
&lt;/p&gt;&lt;p&gt;So how to solve this problem? In SQL Server 2000, the answer
is, find some other paging mechanism, probably using a middle tier. But
in SQL Server 2005, we have new and better toys to play with. Allow me
to introduce your new paging best friend, the ROW_NUMBER() function.
For those readers who are slow on the uptake, this function does
exactly what its name implies; it creates a surrogate row number for
each row in a result set. So now, instead of the very painfully
inefficient COUNT(*) methods, we can let SQL Server do all the work as
it builds the result set we actually want...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20010 A.SomewhatLargeString, ROW_NUMBER() OVER(ORDER BY A.SomewhatLargeString) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	ORDER BY A.SomewhatLargeString) x&lt;br&gt;WHERE x.TheCount BETWEEN 20000 AND 20010&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
I guess we can call that a tiny improvement.  Total estimated cost: 0.202.  That's only about a &lt;i&gt;25 MILLION PERCENT&lt;/i&gt; difference.
&lt;/p&gt;&lt;p&gt;So why is ROW_NUMBER so much more efficient? It's a combination
of the COUNT(*) method itself being inefficient and the query optimizer
probably not handling it as well as it should. The COUNT(*) method
requires an ordered-forward clustered index scan of the table, followed
by correlation of each of the top 20010 rows that we asked for to all
of the rows less than or equal to that row in the same table, for the
count. Alas, the optimizer chooses a nested loop for that, which causes
the cost to shoot up as the operation is repeated over and over.
&lt;/p&gt;&lt;p&gt;The ROW_NUMBER method, on the other hand, requires only the
scan of the TOP 20010 rows, followed by computation of the row number
itself, which is nothing more than a scalar calculation. Then, just
filter the rows. Simple, easy on the optimizer, easy for the system,
and good for your customers. Definitely a great feature that I'm
looking forward to using!&lt;/p&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=75" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item></channel></rss>