THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand.

August 2010 - Posts

  • Sorting, Row Goals, and the TOP 100 Problem

    When you write a query to return the first few rows from a potential result set, you’ll often use the TOP clause. To give a precise meaning to the TOP operation, it will normally be accompanied by an ORDER BY clause. Together, the TOP…ORDER BY construction can be used to precisely identify which top ‘n’ rows should be returned. The ‘Top N’ Sort Thinking about how this requirement might be implemented in an executable query plan, we might expect to see a Sort iterator followed by a Top. In reality, the query optimizer can often collapse these two related operations into a single iterator: a Sort iterator running in Top N Sort mode: That’s an idea you might find familiar if you read my previous post on Row Goals and Grouping. In that entry, we saw how a Sort followed by a Stream Aggregate can sometimes be collapsed into a Sort iterator running in Sort Distinct mode. The General Sorting Algorithm SQL Server’s normal sorting algorithms are suited to a very wide range of ordering requirements. They work extremely well regardless of the data types involved, the size of data to be sorted, or the number of sort keys specified. They also make good use of available memory resources, and can spill to tempdb if required. It is a common misconception that SQL Server will try to perform a sort entirely in memory if it can. In fact the algorithms used are much more complex: they aim to achieve a balance between memory usage, average response time, while maintaining high levels of resource concurrency. Memory is a precious resource in the server, so SQL Server may spill a sort to tempdb, even if sufficient main memory is available. Read More...
  • Row Goals and Grouping

    You might recall (from my last post) that query plans containing a row goal tend to favour nested loops or merge join over hashing. This is because a hash join has to fully process its build input (to populate its hash table) before it can start probing for matches from its second input. Hash join therefore has a high start-up cost, which is balanced by a lower per-row cost once probing begins. In this post, I’ll take a look at how row goals affect grouping operations. Grouping Strategies While the start-up cost of hash join often makes it unsuitable for plans with a row goal, there are times when hashing operations may feature in such plans, since the Hash Match iterator also supports a streaming mode. As an example, say we are asked to list one hundred unique first names from the AdventureWorks Contacts table: Read More...
  • Inside the Optimizer: Row Goals In Depth

    Background One of the core assumptions made by the SQL Server query optimiser’s model is that clients will consume all of the rows produced by a query. This results in plans that favour the overall execution cost, though it may take longer to begin producing rows. Let’s look at an example: The optimiser chooses to perform the logical join using a Hash Match physical iterator, resulting in a plan with a total estimated cost of around 1.4 units. By forcing alternative physical joins using a query hint, we see that a plan based on Sort-Merge would have an estimated cost of just under 10, and using Nested Loops would cost over 18 units. All these cost estimates are based on the assumption that all rows are required. Hash Match As detailed in a previous post, the Hash Match iterator starts by consuming all rows produced by its build input (the Product table) in order to build a hash table. This makes Hash Match a semi-blocking iterator: it can only start producing output rows once the build phase is complete. If we need the first few rows from the query quickly, this join type may not be optimal. Read More...
  • Viewing Another Session’s Temporary Table

    Is it possible to view the contents of a local temporary table, created on a different connection? Say we run this code on “connection 1”: CREATE TABLE #Private (data NVARCHAR(30) NOT NULL); GO INSERT #Private (data) VALUES (N'Only I can see this'); Is there a way to see the contents of the #Private table from “connection 2”? It isn’t particularly easy, but a user with sysadmin permissions, a bit of internal knowledge (and some patience) can do it. Read More...
  • The Impact of Non-Updating Updates

    From time to time, I encounter a system design that always issues an UPDATE against the database after a user has finished working with a record – without checking to see if any of the data was in fact altered. The prevailing wisdom seems to be that “the database will sort it out”. This raises an interesting question: how smart is SQL Server in these circumstances? In this post, I’ll look at a generalisation of this problem: what is the impact of updating a column to the value it already contains? The specific questions I want to answer are: Does this kind of UPDATE generate any log activity? Do any data pages get marked as dirty (and so eventually get written out to disk)? Does SQL Server bother doing the update at all? Read More...
  • Iterators, Query Plans, and Why They Run Backwards

    SQL Server uses an extensible architecture for query optimisation and execution, using ‘iterators’ as basic building-blocks. Iterators are probably most familiar in their graphical showplan representation, where each icon represents a single iterator. They also show up in XML query plan output as RelOp nodes. Each iterator performs a single simple function, such as applying a filtering condition, or performing an aggregation. It can represent a logical operation, a physical operation, or (most often) both. For example, ‘Aggregate’ is a logical operation, and Stream Aggregate and Hash Aggregate are physical operations. Similarly, ‘Inner Join’ is a logical operation; Nested Loops a physical one. Both graphical plans and XML showplan output show these properties: Read More...
  • An Interesting MERGE Bug

    Summary: Investigating an optimiser transformation that exposes a bug in SQL Server’s MERGE implementation. I came across a Connect item today (by fellow SQL Server Central member ‘ALZDBA’) describing how using a combination of relatively new features can produce incorrect results or even an access violation inside SQL Server. Reproducing the bug We’ll look first at how MERGE can produce incorrect an output. We’ll need two tables: one that contains two rows of existing data; and a second that contains four rows of change information. The overall process looks like this: This is a classic MERGE (or “upsert”) requirement. If the row already exists, the new value is added to it, otherwise a new row is inserted. Here’s the code to generate the sample tables and data: Read More...
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement