<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Adam Machanic : Performance, Query Tuning</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/Query+Tuning/default.aspx</link><description>Tags: Performance, Query Tuning</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Query Tuning Mastery at PASS Summit 2012: The Video</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2012/11/13/query-tuning-mastery-at-pass-summit-2012-the-video.aspx</link><pubDate>Tue, 13 Nov 2012 14:49:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46135</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>4</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/46135.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=46135</wfw:commentRss><description>An especially clever community member was kind enough to reverse-engineer the video stream for me, and came up with a direct link to the PASS TV video stream for my Query Tuning Mastery: The Art and Science of Manhandling Parallelism talk, delivered at...(&lt;a href="http://www2.sqlblog.com/blogs/adam_machanic/archive/2012/11/13/query-tuning-mastery-at-pass-summit-2012-the-video.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=46135" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/parallel+processing/default.aspx">parallel processing</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/PASS/default.aspx">PASS</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category></item><item><title>Query Tuning Mastery at PASS Summit 2012: The Demos</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2012/11/12/query-tuning-mastery-at-pass-summit-2012-the-demos.aspx</link><pubDate>Mon, 12 Nov 2012 04:00:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46095</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>16</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/46095.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=46095</wfw:commentRss><description>For the second year in a row, I was asked to deliver a 500-level "Query Tuning Mastery" talk in room 6E of the Washington State Convention Center, for the PASS Summit. ( Here's some information about last year's talk, on workspace memory. ) And for the...(&lt;a href="http://www2.sqlblog.com/blogs/adam_machanic/archive/2012/11/12/query-tuning-mastery-at-pass-summit-2012-the-demos.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=46095" width="1" height="1"&gt;</description><enclosure url="http://www2.sqlblog.com/blogs/adam_machanic/attachment/46095.ashx" length="24551" type="application/x-zip-compressed" /><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/parallel+processing/default.aspx">parallel processing</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/PASS/default.aspx">PASS</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category></item><item><title>PASS Summit 2011 - Zen and the Art of Workspace Memory - Demos</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2011/10/16/pass-summit-2011-zen-and-the-art-of-workspace-memory-demos.aspx</link><pubDate>Mon, 17 Oct 2011 00:23:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39085</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>18</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/39085.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=39085</wfw:commentRss><description>What a rush . Standing on the stage in an almost-full 1,000-person room, I (very) momentarily wondered what I'd been thinking when I submitted a 500-level talk for the biggest SQL Server conference in the world. But despite a rough start--my laptop crashed...(&lt;a href="http://www2.sqlblog.com/blogs/adam_machanic/archive/2011/10/16/pass-summit-2011-zen-and-the-art-of-workspace-memory-demos.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=39085" width="1" height="1"&gt;</description><enclosure url="http://www2.sqlblog.com/blogs/adam_machanic/attachment/39085.ashx" length="66600" type="application/x-zip-compressed" /><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/memory/default.aspx">memory</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Optimization/default.aspx">Optimization</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/PASS/default.aspx">PASS</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category></item><item><title>SQL Server Query Processing Puzzle: LIKE vs ?</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2008/04/22/sql-server-query-processing-puzzle-like-vs.aspx</link><pubDate>Tue, 22 Apr 2008 14:18:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:6344</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>33</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/6344.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=6344</wfw:commentRss><description>&lt;p&gt;How creative are you with manipulating your queries to produce more efficient plans? Try the following puzzle and e-mail your solution to me at [&amp;lt;do_not_mail&amp;gt; @ do_not_mail.com].  Make sure to include an explanation of why it works, as well as your mailing address. The best two solutions/explanations win a free copy of &lt;a href="http://www.amazon.com/dp/159059729X"&gt;Expert SQL Server 2005 Development&lt;/a&gt;, a wonderful feeling of accomplishment, plus eternal fame and glory when I reveal your solutions here on the blog.

&lt;/p&gt;&lt;p&gt;Run the following T-SQL to create two tables in TempDB:
&lt;/p&gt;&lt;pre style="margin-left:40px;"&gt;USE TempDB&lt;br&gt;GO&lt;br&gt;&lt;br&gt;CREATE TABLE b1 (blat1 CHAR(5) NOT NULL)&lt;br&gt;CREATE TABLE b2 (blat2 VARCHAR(200) NOT NULL)&lt;br&gt;GO&lt;br&gt;&lt;br&gt;INSERT b1&lt;br&gt;SELECT LEFT(AddressLine1, 5) AS blat1&lt;br&gt;FROM AdventureWorks.Person.Address&lt;br&gt;&lt;br&gt;INSERT b2&lt;br&gt;SELECT AddressLine1 AS blat2&lt;br&gt;FROM AdventureWorks.Person.Address&lt;br&gt;GO&lt;/pre&gt;
Now consider the following query:
&lt;pre style="margin-left:40px;"&gt;SELECT *&lt;br&gt;FROM b1&lt;br&gt;JOIN b2 ON&lt;br&gt;    b2.blat2 LIKE b1.blat1 + '%'&lt;/pre&gt;&lt;p&gt;
This query takes around three minutes to run on my notebook, and does over 1.8 million logical reads.  Can you figure out a way to re-write it so that it performs better?  No modification of the base tables or addition of any other objects is allowed (sorry, no indexed views!) -- the challenge is to tune this by doing nothing more than re-writing the query.

&lt;/p&gt;&lt;p&gt;Good luck!  I'll leave the contest open for submissions until May 1.&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=6344" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/puzzle/default.aspx">puzzle</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category></item><item><title>Who's On First? Solving the Top per Group Problem (Part 1: Technique)</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2008/02/08/who-s-on-first-solving-the-top-per-group-problem-part-1-technique.aspx</link><pubDate>Fri, 08 Feb 2008 23:09:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:4992</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>18</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/4992.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=4992</wfw:commentRss><description>&lt;p&gt;Relative comparison is a simple matter of human nature. From early childhood we compare and contrast what we see in the world around us, building a means by which to rate what we experience. And as it turns out, this desire to discover top and bottom, rightmost and leftmost, or best and worst happens to extend quite naturally into business scenarios. Which product is the top seller? How about the one that's simply not moving off the shelves? Which of our customers has placed the most expensive order? What are the most recent orders placed at each of our outlets?&lt;/p&gt;&lt;p&gt;In the world of common business questions, the edge cases are generally of most interest. What's in the middle is unimportant; it's often too difficult for the mind to compare and comprehend when there are hundreds, thousands, or even millions of items, transactions, or facts that are all within a similar range. Instead, we focus on those that stick out in some extraordinary way.&lt;/p&gt;&lt;p&gt;Those of us who work with SQL products on a regular basis are faced with solving this same problem time and again as we work through various business requirements. Over time, I have noticed four basic query patterns that can be used to solve the problem; each are logically equivalent (within certain restrictions -- more on that later), but can have surprisingly different performance characteristics depending on the data being queried. In this first post, I will outline the available patterns/methods. In the following posts, I will show the results of testing each pattern against a variety of scenarios in an attempt to discover where and when each should be used.&lt;/p&gt;&lt;p&gt;The four basic patterns are outlined below. Each of the methods is illustrated using a query to show all customers' names, plus their most recent order date, and the amount of that order. I've included notes that indicate where logic differences can arise among the various methods.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 1: Join to full group and use correlated subquery with a MIN/MAX aggregate to filter&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method we use an inner join to get all required columns, then filter the resultant set using a correlated subquery in the WHERE clause. &lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;JOIN Orders o ON o.CustomerId = c.CustomerId&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate&amp;nbsp; =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT MAX(o1.OrderDate)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o1&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.CustomerId = o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: With this method ties are automatically included in the output, unless a tiebreaker is specified (which can be tricky given that you only have one column to work with). This method does not allow you to pull back an arbitrary number of rows, such as top 10 per customer; you are limited to the edge and any ties that might exist. &lt;b&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 1a: Join to full group and use correlated subquery with TOP(n) and ORDER BY to filter&lt;/b&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;This method is almost identical to Method 1 (which is why it is classified here as 1a), but the TOP and ORDER BY allow for a bit more flexibility than the aggregates.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;JOIN Orders o ON o.CustomerId = c.CustomerId&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate&amp;nbsp; =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP(1)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.OrderDate&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o1&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.CustomerId = o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o1.OrderDate DESC&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;/blockquote&gt;&lt;p&gt;Logic notes: With this method you can more easily integrate a tiebreaker than with Method 1; the comparison column can be anything, including a primary key, and you can still order on whatever column makes most sense. In addition, you can take more rows than with Method 1 by using IN instead of = in the WHERE clause, and increasing the argument value to TOP.&lt;br&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 2: CROSS APPLY to ordered TOP(n)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method, SQL Server 2005's CROSS APPLY operator is used. This operator allows us to essentially create a table-valued correlated subquery -- something that impossible in previous versions of SQL Server. By using TOP in conjunction with ORDER BY we can get as many rows per group as needed.&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;CROSS APPLY&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP(1)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId = c.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate DESC&lt;br&gt;) x&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: This method is almost identical, from a logic point of view, with Method 1a modified to use IN on a primary key column. With both methods WITH TIES can be added to the TOP in order to get ties.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 3: Join to derived table that uses a partitioned, ordered windowing function, and filter in the outer query based on the row number&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this method a derived table or CTE is used, in conjunction with a windowing function partitioned based on the required grain of the final query. So for the "most recent order per customer" query, the row number is partitioned based on the customer. This gives us a count starting at 1 for each customer, which can be filtered in the outer query.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;INNER JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderAmount,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY o.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY o.OrderDate DESC&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS r&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;) x ON&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.CustomerId = c.CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; AND x.r = 1&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;Logic notes: If ties are important, use DENSE_RANK instead of ROW_NUMBER. ROW_NUMBER is good for arbitrary TOP(n), similar to Method 2. Unlike the previously described methods, in conjunction with DENSE_RANK this method can return an arbitrary TOP(n) rows, all of which can include ties. So if you would like to see the three most recent order dates and each happens to have multiple orders, this method will be able to return them all by simply filtering on x.r = 3. This would not be directly possible with any of the other methods described here.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Method 4: "Carry-along sort"&lt;/b&gt;&lt;/p&gt;&lt;p&gt;This is the only "tricky" method, and not one that I recommend using, except as a last resort. I'm including it here only for completeness and comparison because it happens to be a very high performance method in some cases. This method involves converting each of the required inner columns into a string, concatenating them, then applying an aggregate to the string as a whole. By putting the "sort" column first, the other data is "carried along" -- thus the name for the method. The concatenated data is then "unpacked" in the outer query. This can be surprisingly efficient from an I/O standpoint, but the resultant code is a maintenance nightmare and it is quite easy to introduce errors. In addition, this method can only return the top 1 per group -- no ties or multiple return items are supported.&lt;br&gt;&lt;/p&gt;&lt;blockquote&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.FirstName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; c.LastName,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(DATETIME, SUBSTRING(x.OrderInfo, 1, 8)) AS OrderDate,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(MONEY, SUBSTRING(x.OrderInfo, 9, 15)) AS OrderAmount&lt;br&gt;FROM Customers c&lt;br&gt;INNER JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAX&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(CHAR(8), OrderDate, 112) +&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONVERT(CHAR(15), SubTotal)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) OrderInfo&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Orders o&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; GROUP BY&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerId&lt;br&gt;) x ON&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.CustomerId = c.CustomerId&lt;br&gt;&lt;/blockquote&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;This post is just the beginning; watch this space in the coming days for a series of performance tests and analysis of these methods, and some results that I personally found to be quite surprising.&lt;br&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=4992" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/logic/default.aspx">logic</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Medians, ROW_NUMBERs, and performance</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/12/18/medians-row-numbers-and-performance.aspx</link><pubDate>Mon, 18 Dec 2006 19:52:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:437</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>6</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/437.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=437</wfw:commentRss><description>A couple of days ago, Aaron Bertrand posted about &lt;a href="http://sqlblog.com/blogs/aaron_bertrand/archive/2006/12/15/428.aspx"&gt;a method for calculating medians in SQL Server 2005&lt;/a&gt; using the ROW_NUMBER function in conjunction with the COUNT aggregate. This method (credited to Itzik Ben-Gan) is interesting, but I discovered an even better way to attack the problem in &lt;a href="http://www.amazon.com/Celkos-Analytics-Kaufmann-Management-Systems/dp/0123695120/sr=8-1/qid=1166482464/ref=sr_1_1/105-6595410-7450029?ie=UTF8&amp;amp;s=books"&gt;Joe Celko's Analytics and OLAP in SQL&lt;/a&gt;.&lt;br&gt;&lt;br&gt;Rather than using a COUNT aggregate in conjunction with the ROW_NUMBER function, Celko's method uses ROW_NUMBER twice: Once with an ascending sort, and again with a descending sort. The output rows can then be matched based on the ascending row number being within +/- 1 of the descending row number.&amp;nbsp; This becomes clearer with a couple of small examples:&lt;br&gt;&lt;br&gt;

&lt;table class="MsoTableGrid" style="border:medium none;border-collapse:collapse;" cellpadding="0" cellspacing="0"&gt;
 &lt;tr&gt;
  &lt;td style="border:1pt solid windowtext;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;A&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;B&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;C&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;D&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
&lt;/table&gt;

&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/p&gt;

&lt;table class="MsoTableGrid" style="border:medium none;border-collapse:collapse;" cellpadding="0" cellspacing="0"&gt;
 &lt;tr&gt;
  &lt;td style="border:1pt solid windowtext;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;A&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:solid solid solid none;border-color:windowtext windowtext windowtext -moz-use-text-color;border-width:1pt 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;5&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;B&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;C&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;3&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;"&gt;
  &lt;p class="MsoNormal"&gt;D&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;4&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;"&gt;
  &lt;p class="MsoNormal"&gt;2&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr style="height:14.35pt;"&gt;
  &lt;td style="border-style:none solid solid;border-color:-moz-use-text-color windowtext windowtext;border-width:medium 1pt 1pt;padding:0in 5.4pt;width:23.4pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;E&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;5&lt;/p&gt;
  &lt;/td&gt;
  &lt;td style="border-style:none solid solid none;border-color:-moz-use-text-color windowtext windowtext -moz-use-text-color;border-width:medium 1pt 1pt medium;padding:0in 5.4pt;width:27pt;height:14.35pt;"&gt;
  &lt;p class="MsoNormal"&gt;1&lt;/p&gt;
  &lt;/td&gt;
 &lt;/tr&gt;
&lt;/table&gt;

&lt;p class="MsoNormal"&gt;&lt;o:p&gt;&lt;br&gt;&lt;/o:p&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;o:p&gt;In the first table (even number of rows), the median rows are B and C. These can be matched based on [Ascending Column] IN ([Descending Column] + 1, [Descending Column] - 1). In the second table (odd number of rows), the median row is C, which is matched where [Ascending Column] = [Descending Column]. Note that in the second table, the match criteria &lt;/o:p&gt;for the first table does not apply -- so the generic expression to match either case is the combination of the two:&amp;nbsp; [Ascending Column] IN ([Descending Column], [Descending Column] + 1, [Descending Column] - 1).&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;We can apply this logic within the AdventureWorks database to find the median of the "TotalDue" amount in the Sales.SalesOrderHeader table, for each customer:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;/p&gt;&lt;div class="code"&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp; AVG(TotalDue)&lt;br&gt;FROM&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TotalDue,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc&lt;br&gt;&amp;nbsp;&amp;nbsp; FROM Sales.SalesOrderHeader SOH&lt;br&gt;) x&lt;br&gt;WHERE &lt;br&gt;&amp;nbsp;&amp;nbsp; RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)&lt;br&gt;GROUP BY CustomerId&lt;br&gt;ORDER BY CustomerId;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;The equivalent logic using Itzik Ben-Gan's method follows:&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;div class="code"&gt;&lt;br&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp; AVG(TotalDue)&lt;br&gt;FROM&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TotalDue,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY TotalDue) AS RowNum,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*) OVER (&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PARTITION BY CustomerId) AS RowCnt&lt;br&gt;&amp;nbsp;&amp;nbsp; FROM Sales.SalesOrderHeader&lt;br&gt;) x&lt;br&gt;WHERE&lt;br&gt;&amp;nbsp;&amp;nbsp; RowNum IN ((RowCnt + 1) / 2, (RowCnt + 2) / 2)&lt;br&gt;GROUP BY CustomerId&lt;br&gt;ORDER BY CustomerId;&lt;/div&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Taking a look at the estimated execution plans for these two queries, we might believe that Ben-Gan's method is superior: Celko's algorithm requires an expensive intermediate sort operation and has an estimated cost of 4.96, compared to 3.96 for Ben-Gan's. &lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;Remember that these are merely estimates. And as it turns out, this is one of those times that the Query Optimizer's cost estimates are are totally out of line with the reality of what
happens when you actually run the queries. Although the performance
difference is not especially noticeable on a set of data as small as
that in Sales.SalesOrderHeader, check out the STATISTICS IO output. Celko's version does 703 logical reads; Ben-Gan's does an astonishing 140110!&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;br&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;There is a good lesson to be learned from this: &lt;i&gt;Cost-based optimization is far from perfect!&lt;/i&gt; Never completely trust what estimates tell you; they've come a long way, but clearly there is still some work to do in this area. The only way to actually determine that one query is better than another is to run it against a realistic set of data and look at how much IO and CPU time is actually used.&lt;br&gt;&lt;/p&gt;&lt;br&gt;In this case, Ben-Gan's query probably should perform better than Celko's. It seems odd that the Query Processor can't collect the row counts at the same time it processes the row numbers. Regardless, as of today this is the best way to solve this problem... Not that I've ever needed a median in any production application I've worked on. But I suppose that's beside the point!&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=437" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Announcing SQLQueryStress: A simple query load tool</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/10/21/announcing-sqlquerystress-a-simple-query-load-tool.aspx</link><pubDate>Sat, 21 Oct 2006 04:29:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:327</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>1</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/327.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=327</wfw:commentRss><description>On October 20, 2004 -- two years ago -- &lt;a href="http://www.sqljunkies.com/WebLog/amachanic/archive/2004/10/20/4699.aspx"&gt;I announced that I was considering writing my own query load testing tool&lt;/a&gt;.&lt;br&gt;&lt;br&gt;Clearly,
my follow through on this promise has been incredibly delayed.&amp;nbsp; But all
is not lost, and I'm happy to report that I am true to my word.&amp;nbsp; As of
today, &lt;a href="http://www.datamanipulation.net/SQLQueryStress/"&gt;beta-1 of my new SQLQueryStress tool is available for download&lt;/a&gt;.&lt;br&gt;&lt;br&gt;SQLQueryStress
is a free tool that provides a simple means of load testing individual
queries.&amp;nbsp; It provides some support for query parameterization and
options for collection of I/O and CPU metrics.&amp;nbsp; It is not intended to
replace tools such as Visual Studio Team System's load tests, but
rather to be a simple and easy-to-use tool in the DBA or database
developer's kit. I think it lives up to that goal, and initial feedback
I've gotten from a few people I've shown it to has been positive.&amp;nbsp; I've
been working on this project on and off for around the last eight
months, so I'm quite excited to finally release it to the public!&lt;br&gt;&lt;br&gt;Please &lt;a href="http://www.datamanipulation.net/SQLQueryStress/"&gt;check it out&lt;/a&gt;
and provide me with any feedback you have: What works?&amp;nbsp; What doesn't
work?&amp;nbsp; What other features would you like to see?&amp;nbsp; I'm planning to do
quite a bit more with this tool, so I'm definitely interested in what
users think would be useful.&amp;nbsp; And of course, please let me know if you
find any bugs (but given the rigor with which I've tested this project,
there couldn't &lt;i&gt;possibly &lt;/i&gt;be any!)&lt;br&gt;&lt;br&gt;Thanks, and enjoy!&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=327" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Exploring the secrets of intermediate materialization</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/10/03/exploring-the-secrets-of-intermediate-materialization.aspx</link><pubDate>Wed, 04 Oct 2006 02:18:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:267</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>24</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/267.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=267</wfw:commentRss><description>When working with SQL Server 2000, I used to have this little trick I'd pull out after exhausting all other ideas for tuning a query.&amp;nbsp; And I thought that my little trick was dead in SQL Server 2005, but thanks to fellow SQL Server MVP &lt;a href="http://robfarley.blogspot.com/"&gt;Rob Farley&lt;/a&gt;, I am officially reviving my trick from the dead here and now, in this blog post.&lt;br&gt;&lt;br&gt;... But first, let's start with an example query.&amp;nbsp; Here's the scenario: You work for AdventureWorks, and management has asked you to create a report to find out how many peers each employee in the company has. You see, AdventureWorks management seems to believe that if two employees are
managed by the same person, they must have exactly the same job
function, and they can do each others' jobs equally well.&amp;nbsp; So what they want to do is find out which employees have too many peers (might as well downsize some of that extraneous fluff), and at the same time find out which employees, should they be hit by a bus tomorrow, could be immediately substituted for by a colleague. Now, whether or not management's belief is utterly moronic or not is beyond the scope of this post, so dash any such thoughts from your head until you've read to the end, and then resume pondering along those lines, which I'm sure will end up putting a smile on your face.&amp;nbsp; &lt;br&gt;&lt;br&gt;But smiles are for later.&amp;nbsp; At this point I've managed to go off on a horribly involved tangent, so let's get back to the query at hand:&lt;br&gt;&lt;br&gt;&lt;pre class="code"&gt;SELECT &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.EmployeeId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*) AS TheCount&lt;br&gt;FROM HumanResources.Employee x&lt;br&gt;JOIN HumanResources.Employee y on y.ManagerId = x.ManagerId&lt;br&gt;GROUP BY x.EmployeeId&lt;/pre&gt;&lt;br&gt;&lt;br&gt;What we're doing here is finding all employees managed by the same manager, and then taking a count of those employees.&amp;nbsp; Yes, it would have made more sense to simply find out how many employees are managed by each manager, but that's not what management asked for, and management clearly thinks better than you do. So go run the report!&lt;br&gt;&lt;br&gt;But what does any of this have to do with query tuning tricks, you ask (while tidying up your resume a bit)?&amp;nbsp; To answer that question, let's take a quick peek at the I/Os our query is using, in addition to the query plan.&amp;nbsp; First, a baseline for the I/Os:&lt;br&gt;&lt;br&gt;&lt;pre class="code"&gt;SELECT *&lt;br&gt;FROM HumanResources.Employee&lt;/pre&gt;&lt;br&gt;&lt;br&gt;Result, obtained via Profiler: 20 logical reads.&amp;nbsp; OK, and now the test query in question: 827 logical reads. Quite a jump for a query that only uses the one table -- we're clearly wasting a lot of resources.&amp;nbsp; And looking at the query plan, it's obvious we can do better.&amp;nbsp; An outer table scan, looped to find the count for each employee -- that's a lot of index operations.&lt;br&gt;&lt;br&gt;A common way to start tuning this kind of query is to move the aggregation into a derived table. After peering at this query for a while, one might come to the conclusion that there's no reason to aggregate on ManagerId more than once per manager. Why do it once per employee?&lt;br&gt;&lt;br&gt;&lt;pre class="code"&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.EmployeeId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y.theCount&lt;br&gt;FROM HumanResources.Employee x&lt;br&gt;JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; p.ManagerId, &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM HumanResources.Employee p&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; GROUP BY p.ManagerId&lt;br&gt;) y (ManagerId, theCount) ON y.ManagerId = x.ManagerId&lt;/pre&gt;&lt;br&gt;&lt;br&gt;Seems better, but upon running it, we see that it produces 819 logical reads and almost exactly the same query plan. Not much of an improvement.&amp;nbsp; And alas, there's not much more we can do here.&amp;nbsp; There just aren't too many ways to skin this query, and each of them requires some kind of loop to get the count, either implied or otherwise... Right?&lt;br&gt;&lt;br&gt;And now we're almost to "dirty trick" territory.&amp;nbsp; But let's first try a not-so-dirty trick. A temp table might eliminate some of the overhead, right?&amp;nbsp; Then we'll only have to query the base tables once...&lt;br&gt;&lt;br&gt;&lt;pre class="code"&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; p.ManagerId, &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*) AS theCount&lt;br&gt;INTO #y&lt;br&gt;FROM HumanResources.Employee p&lt;br&gt;GROUP BY p.ManagerId&lt;br&gt;&lt;br&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.EmployeeId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y.theCount&lt;br&gt;FROM HumanResources.Employee x&lt;br&gt;JOIN #y y ON y.ManagerId = x.ManagerId&lt;/pre&gt;&lt;br&gt;&lt;br&gt;207 logical reads.&amp;nbsp; Quite an improvement!&amp;nbsp; But the temp table is still using a nested loop, and a merge would be so much nicer, wouldn't it?&amp;nbsp; A MERGE JOIN hint drops the number of reads to 115, but I still feel that we can do even &lt;i&gt;better&lt;/i&gt;. &lt;br&gt;&lt;br&gt;Now in SQL Server 2000 at about this point in my query tuning excercise, I might try forcing &lt;i&gt;intermediate materialization&lt;/i&gt; of the derived table, sans the temp table, by using TOP 100 PERCENT in conjunction with ORDER BY. Unfortunately, the SQL Server query optimizer team &lt;a href="http://blogs.msdn.com/queryoptteam/archive/2006/03/24/560396.aspx"&gt;decided that this wasn't a good idea&lt;/a&gt;, and the optimizer now ignores such attempts.&lt;br&gt;&lt;br&gt;And until earlier today, I thought the game was over. Until I was reminded by Rob that TOP takes a number of rows in addition to a percent. The trick, then?&amp;nbsp; Use a bigger number of rows than you'll ever actually get back... Say, the maximum value for SQL Server's INTEGER type (2147483647)?&lt;br&gt;&lt;br&gt;By applying TOP and ORDER BY within the derived table, we can force SQL Server to perform intermediate materialization of the results.&amp;nbsp; And by playing with the ORDER BY properly, we can even prompt the optimizer to choose a merge...&lt;br&gt;&lt;br&gt;&lt;pre class="code"&gt;SELECT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x.EmployeeId,&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y.theCount&lt;br&gt;FROM HumanResources.Employee x&lt;br&gt;JOIN&lt;br&gt;(&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT TOP (2147483647)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; p.ManagerId, &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(*)&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM HumanResources.Employee p&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; GROUP BY p.ManagerId&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ORDER BY p.ManagerId&lt;br&gt;) y (ManagerId, theCount) on y.ManagerId = x.ManagerId&lt;/pre&gt;&lt;br&gt;&lt;br&gt;And the result of all of this hard labor?&amp;nbsp; &lt;i&gt;10 logical reads&lt;/i&gt; (1000% improvement over the next best method), a merge operation, and if I do say so myself, a very good topic for a blog post.&lt;br&gt;&lt;br&gt;The usual caveats apply.&amp;nbsp; Do not try this at home.&amp;nbsp; Do not rely on this undocumented behavior.&amp;nbsp; Do not pass Go.&amp;nbsp; Do not fail to hire me to tune your databases if this trick doesn't fix all of your problems.&amp;nbsp; And, lest I forget, do not waste time reading this blog when management needs that report &lt;i&gt;yesterday&lt;/i&gt;!&lt;br&gt;&lt;br&gt;Anyway, until next time, enjoy!&lt;br&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=267" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Optimization/default.aspx">Optimization</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category></item><item><title>Scalar functions, inlining, and performance: An entertaining title for a boring post</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx</link><pubDate>Fri, 04 Aug 2006 03:07:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:146</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>30</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/146.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=146</wfw:commentRss><description>Scalar.&amp;nbsp; Function.&lt;br&gt;&lt;br&gt;Wow.&lt;br&gt;&lt;br&gt;Could any other combination of words evoke the same feeling of encapsulation, information hiding, and simplification of client code?&amp;nbsp; After years spent developing software in the procedural and OO worlds, it can be difficult--perhaps, even impossible--to migrate over to working with SQL Server and not consider how to architect your data access logic using some of the same techniques you'd use in the application tier.&lt;br&gt;&lt;br&gt;In short: Why would you &lt;i&gt;ever&lt;/i&gt; write the same piece of logic more than once?&amp;nbsp; Answer: &lt;i&gt;You wouldn't (damn it!)&lt;/i&gt;.&amp;nbsp; And so Microsoft bestowed upon the SQL Server community, in SQL Server 2000, the ability to write scalar user-defined functions.&amp;nbsp; And they could have been such beautiful things...&lt;br&gt;&lt;br&gt;But alas, reality can be painful, and as developers tried these new tools they were struck with a strange feeling of sadness as their applications buckled under the weight of what otherwise would have been a wonderful idea. As it turned out, putting all but the simplest of logic into these scalar functions was a recipe for disaster. Why?&amp;nbsp; Because they're essentially cursors waiting to happen (but they don't &lt;i&gt;look &lt;/i&gt;like cursors, so you may not know... until it's too late.)&lt;br&gt;&lt;br&gt;The central problem is that when you wrap logic in a multistatement UDF, the query optimizer just can't unwrap it too easily. And so there's really only one way to evaluate a scalar UDF: call it once per row. And that is really nothing more than a cursor.&lt;br&gt;&lt;br&gt;Seeing this behavior in action is easy enough; consider the following scalar function that some poor sap DBA working for AdventureWorks might be compelled to create:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;CREATE FUNCTION GetMaxProductQty_Scalar&lt;br&gt;(&lt;br&gt;    @ProductId INT&lt;br&gt;)&lt;br&gt;RETURNS INT&lt;br&gt;AS&lt;br&gt;BEGIN&lt;br&gt;    DECLARE @maxQty INT&lt;br&gt;&lt;br&gt;    SELECT @maxQty = MAX(sod.OrderQty)&lt;br&gt;    FROM Sales.SalesOrderDetail sod&lt;br&gt;    WHERE sod.ProductId = @ProductId&lt;br&gt;&lt;br&gt;    RETURN (@maxQty)&lt;br&gt;END&lt;br&gt;&lt;/pre&gt;
&lt;br&gt;Simple enough, right?&amp;nbsp; Let's pretend that AdventureWorks has a bunch of reports, each of which requires maximum quantity sold per product.&amp;nbsp; So the DBA, thinking he can save himself some time and keep everything centralized (and that is a good idea), puts all of the logic into a scalar UDF.&amp;nbsp; Now, when he needs this logic, he can just call the UDF.&amp;nbsp; And if the logic has a bug, or needs to be changed, he can change it in exactly &lt;i&gt;one&lt;/i&gt; place.&amp;nbsp; And so life is great... Or is it?&lt;br&gt;&lt;br&gt;Let's take a look at a sample query:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;SELECT&lt;br&gt;    ProductId,&lt;br&gt;    dbo.GetMaxProductQty_Scalar(ProductId)&lt;br&gt;FROM Production.Product&lt;br&gt;ORDER BY ProductId&lt;br&gt;&lt;/pre&gt;
&lt;br&gt;This query does nothing more than get the max quantity sold for each product in the Productin.Product table. And a look at the execution plan or the STATISTICS IO output might indicate that there's nothing too interesting going on here: The execution plan shows an index scan (to be expected, with no WHERE clause), followed by a compute scalar operation (the call to the UDF). And STATISTICS IO shows a mere 16 reads.&lt;br&gt;&lt;br&gt;So why is this query so problematic? Because the real issue is hiding just beneath the surface.&amp;nbsp;&lt;i&gt; The execution plan and STATISTICS IO didn't consider any of the code evaluated within the UDF!&lt;/i&gt; To see what's &lt;i&gt;really&lt;/i&gt; going on, fire up SQL Server Profiler, turn on the SQL:BatchCompleted event, and make sure you're showing the Reads column. Now run the query again and you'll see that this seemingly-innocent block of T-SQL is, in fact, using 365,247 logical reads. Quite a difference!&lt;br&gt;&lt;br&gt;Each of those "compute scalar" operations is really a call to the UDF, and each of the calls to the UDF is really a new query.&amp;nbsp; And all of those queries (all 504 of them -- the number of products in the Product table) add up to massive I/O.&amp;nbsp; Clearly not a good idea in a production environment.&lt;br&gt;&lt;br&gt;But luckily, we're not done here yet (or this would be a very boring post). Because while the performance penalty is a major turnoff, I really do love the encapsulation afforded by scalar UDFs.&amp;nbsp; I want them (or a similar tool) in my toolbox... And so I got to thinking.&lt;br&gt;&lt;br&gt;The answer to my dilemma, as it turns out, is to not use scalar UDFs at all, but rather to use &lt;i&gt;inline table-valued&lt;/i&gt; UDFs and treat them like scalars. This means that queries get slightly more complex than with scalar UDFs, but because the funtions are inlined (treated like macros) they're optimized along with the rest of the query. Which means, no more under-the-cover cursors.&lt;br&gt;&lt;br&gt;Following is a modified version of the scalar UDF posted above:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;CREATE FUNCTION GetMaxProductQty_Inline&lt;br&gt;(&lt;br&gt;    @ProductId INT&lt;br&gt;)&lt;br&gt;RETURNS TABLE&lt;br&gt;AS&lt;br&gt;    RETURN&lt;br&gt;    (&lt;br&gt;        SELECT MAX(sod.OrderQty) AS maxqty&lt;br&gt;        FROM Sales.SalesOrderDetail sod&lt;br&gt;        WHERE sod.ProductId = @ProductId&lt;br&gt;    )&lt;br&gt;&lt;/pre&gt;&lt;br&gt;This function is no longer actually scalar--in fact, it now returns a table. It just so happens that the table has exactly one column and exactly one row, and uses the same logic as the scalar UDF shown above. So it's still scalar enough for my purposes.&lt;br&gt;&lt;br&gt;The query shown above, used to retrieve the maximum quantity sold for each product, will not quite work with this UDF as-is. Trying to substitute in the new UDF will result in nothing more than a variant on an "object not found" error.&amp;nbsp; Instead, you need actually treat this function like&amp;nbsp; it returns a table (due to the fact that it does).&amp;nbsp; And that means, in this case, a &lt;i&gt;subquery&lt;/i&gt;:&lt;br&gt;&lt;br&gt;
&lt;pre class="code"&gt;SELECT&lt;br&gt;    ProductId,&lt;br&gt;    (&lt;br&gt;        SELECT MaxQty&lt;br&gt;        FROM dbo.GetMaxProductQty_Inline(ProductId)&lt;br&gt;    ) MaxQty&lt;br&gt;FROM Production.Product&lt;br&gt;ORDER BY ProductId&lt;br&gt;&lt;/pre&gt;&lt;br&gt;So there it is. We're now treating the table-valued UDF more or less just like a scalar UDF.&amp;nbsp; And the difference in I/O results is really quite astounding: 1267 logical reads in this case. Meaning that the scalar UDF solution is around 288 times more I/O intensive!&lt;br&gt;&lt;br&gt;The question being, of course, was it worth it? The whole thing could have been written as one query, without the need for any UDFs at all. And the final query in this case is quite a bit more complex than the previous version, in addition to the fact that the encapsulation breaks down to some degree by forcing the caller to have some knowledge of how the UDF actually works. But I do feel that this sacrifice is warranted in some cases. Although the "greatest quantity sold" example shown here is simplistic, imagine other situations in which the same code fragments or logic are used over and over, due to lack of a good way of standardizing and centralizing them.&amp;nbsp; I know I've seen that a lot in my work, and some examples I can think of have included complex logic that might very well have been easier to maintain in a UDF.&lt;br&gt;&lt;br&gt;This technique may not be perfect for every case, and it certainly has its tradeoffs. But it may be a useful trick to keep in the back of your mind for a rainy day in the data center when someone's scalar UDF solution starts breaking down and you need a fix that doesn't require a massive code rewrite.&lt;br&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=146" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Paging in SQL Server 2005</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/paging-in-sql-server-2005.aspx</link><pubDate>Thu, 13 Jul 2006 01:12:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:75</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>3</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/75.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=75</wfw:commentRss><description>I keep seeing questions on newsgroups about paging in stored
procedures, and whether there will be a better way in SQL Server 2005.
However, aside from a few answers in newsgroups, I haven't seen any
content on how to do it. So I'd like to spend a few minutes and share
with you the new features that will make paging stored procedures both
easier to build and a lot more performant...
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;
But first, since this is a performance-related blog, let's generate a bunch of big test data!
&lt;/p&gt;&lt;p&gt;Since I don't have AdventureWorks installed on my test SQL
Server 2005 server at the moment and am too lazy to track it down, the
data is somewhat ugly... Anyway, start this up and let it run for a
while (takes around 35 minutes on my test box):
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT A.Name + B.Name + C.Name AS SomewhatLargeString&lt;br&gt;INTO #BigTableOfStrings&lt;br&gt;FROM	master..spt_values A,&lt;br&gt;	master..spt_values B,&lt;br&gt;	master..spt_values C&lt;br&gt;WHERE	a.TYPE NOT IN ('P', 'R', 'F', 'F_U')&lt;br&gt;	AND b.TYPE NOT IN  ('P', 'R', 'F', 'F_U')&lt;br&gt;&lt;br&gt;CREATE CLUSTERED INDEX CI_LargeString ON #BigTableOfStrings(SomewhatLargeString)&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
Coffee (or preferably, beer) time!  See you in 35 minutes.
&lt;/p&gt;&lt;p&gt;
... All set?
&lt;/p&gt;&lt;p&gt;Okay, now let's pretend we're in a web app with a SQL Server
2000 backend and we want the second page of data... Rows 11-20, ordered
by SomewhatLarge. How might we do this in SQL Server 2000..? Here's one
way:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT A.SomewhatLargeString&lt;br&gt;FROM #BigTableOfStrings A&lt;br&gt;JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;GROUP BY A.SomewhatLargeString&lt;br&gt;HAVING COUNT(*) BETWEEN 11 AND 20&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;... Still waiting for that to return? Keep waiting. Maybe get
another coffee, or go home for the night. On my system, that query has
quite possibly the biggest estimated cost I've ever seen, a staggering
1,523,110,700. Yeah, that sucks. So let's change it around a bit:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20 A.SomewhatLargeString, COUNT(*) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;	GROUP BY A.SomewhatLargeString&lt;br&gt;	ORDER BY A.SomewhatLargeString ) x&lt;br&gt;WHERE x.TheCount BETWEEN 11 AND 20&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Much nicer than before, with an estimated cost on my system of 23.1,
and a virtually instant return time. But wait, we have an over-zealous
user who wants rows 20,001 - 20,010!
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20010 A.SomewhatLargeString, COUNT(*) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	JOIN #BigTableOfStrings B ON B.SomewhatLargeString &amp;lt;= A.SomewhatLargeString&lt;br&gt;	GROUP BY A.SomewhatLargeString&lt;br&gt;	ORDER BY A.SomewhatLargeString ) x&lt;br&gt;WHERE x.TheCount BETWEEN 20001 AND 20010&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Oops, cost skyrocketed to 50516, with a return time of 1:30... Can
you feel your users abandoning your sinking ship of an app and heading
towards the competitors? Probably not, since they won't actually click
through 20,000 rows, but it makes a really good contrived example, so
let's roll with it!
&lt;/p&gt;&lt;p&gt;So how to solve this problem? In SQL Server 2000, the answer
is, find some other paging mechanism, probably using a middle tier. But
in SQL Server 2005, we have new and better toys to play with. Allow me
to introduce your new paging best friend, the ROW_NUMBER() function.
For those readers who are slow on the uptake, this function does
exactly what its name implies; it creates a surrogate row number for
each row in a result set. So now, instead of the very painfully
inefficient COUNT(*) methods, we can let SQL Server do all the work as
it builds the result set we actually want...
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;SELECT x.SomewhatLargeString&lt;br&gt;FROM (&lt;br&gt;	SELECT TOP 20010 A.SomewhatLargeString, ROW_NUMBER() OVER(ORDER BY A.SomewhatLargeString) AS TheCount&lt;br&gt;	FROM #BigTableOfStrings A&lt;br&gt;	ORDER BY A.SomewhatLargeString) x&lt;br&gt;WHERE x.TheCount BETWEEN 20000 AND 20010&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
I guess we can call that a tiny improvement.  Total estimated cost: 0.202.  That's only about a &lt;i&gt;25 MILLION PERCENT&lt;/i&gt; difference.
&lt;/p&gt;&lt;p&gt;So why is ROW_NUMBER so much more efficient? It's a combination
of the COUNT(*) method itself being inefficient and the query optimizer
probably not handling it as well as it should. The COUNT(*) method
requires an ordered-forward clustered index scan of the table, followed
by correlation of each of the top 20010 rows that we asked for to all
of the rows less than or equal to that row in the same table, for the
count. Alas, the optimizer chooses a nested loop for that, which causes
the cost to shoot up as the operation is repeated over and over.
&lt;/p&gt;&lt;p&gt;The ROW_NUMBER method, on the other hand, requires only the
scan of the TOP 20010 rows, followed by computation of the row number
itself, which is nothing more than a scalar calculation. Then, just
filter the rows. Simple, easy on the optimizer, easy for the system,
and good for your customers. Definitely a great feature that I'm
looking forward to using!&lt;/p&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=75" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Query+Tuning/default.aspx">Query Tuning</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item></channel></rss>