THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

  • SQLBits 2015 - Query Tuning Mastery: Clash of the Row Goals - Demos

    Thanks to everyone who attended my talk yesterday! The demos are attached to this post. Let me know if you have any questions.

    Until next time...

  • New Book! T-SQL Querying with Itzik Ben-Gan, Dejan Sarka, Kevin Farlee

    I'm excited to announce that after a year of work, T-SQL Querying is now available in e-book form (dead trees will ship next month)! 

    As you might guess from the author list, this book goes deep into various aspects of T-SQL -- both from querying and query processing perspectives. The book includes Itzik's signature T-SQL coverage, data warehousing information from Dejan, and in-memory OLTP from Kevin. The sections I contributed focus on my favorite topics: parallelism and SQLCLR programming.

    I'm quite proud of what we've accomplished and I hope you'll enjoy the fruits of our labor. 

  • Parallel Performance in London: SQLbits XIV, Superheros Edition!

    I'm happy to announce that I'll be returning again this year to the UK's premier SQL Server conference, SQLbits. This year's show is in London, and has a Superheros theme; it should be a great time.

    On Wednesday March 4th, I'll be delivering "Better Performance Through Parallelism" as a full-day preconference seminar. This seminar teaches you how to properly leverage SQL Server's parallel processing capabilities to greatly improve performance of your biggest queries. It includes a large amount of internals and background information, detailed query plan and tuning guidelines, a discussion on server configuration, as well as information on how to properly monitor parallel queries in your environment.


    I delivered this seminar at this year's PASS Summit in Seattle, and here is what some of the attendees had to say:

    "this presentation was awesome. it was packed with information and delivered in a clear, concise and easily understood manner. there just wasn't enough time which was a bummer since i found the entire day truly fascinating and didn't want to stop soaking in"

    "My favorite presentation of the conference. Very organized, detailed and full of useful information. I found the demos to be absolutely revealing."

    "Extremely knowledgeable speaker, with quite a lot of material. Handled the room well. Will definitely recommend any session that he presents."

    "This was in the top 3 best sessions I attended at PASS or at a PASS related event. Everything from the background of parallelism to performing analysis of a parallel query execution plan was great."


    The SQLbits marketing committee came up with the great idea of having preconference speakers create videos to help promote our sessions. I had a bit of fun with mine...

    Hope to see many of you in London!
  • PASS Summit 2014: Manhandling Parallelism - Demos

    Just a quick post to say thank you to everyone who attended today's Manhandling Parallelism talk! I hope you had even half as much fun as I did. For those of you who didn't attend, the talk was recorded and broadcast on "PASS TV," and I'm sure it will be available somewhere soon for on-demand viewing.

    The demos are attached.

    Let me know in the comments area if you have any questions!


  • PASS Summit 2014 and the Contradiction of Long Running Things

    As database professionals, I find that one simple thing unites us in our mission, above all else: we absolutely hate long running things.

    I'm with you. I'm here for you. I want to help you reject long running things.

    PASS Summit 2014, the premier SQL Server conference of the year, is returning to Seattle this November. And I am returning to PASS Summit, after a one-year hiatus, with 8 hours of fun and exciting performance tuning content to help you expunge the long running things from your life.


    Better Performance Through Parallelism (Full-Day, 400 Level Preconference Seminar)

    Big queries? Lots of data? No problem. Your server probably has a tremendous amount of CPU resources (most do, these days). And your queries are most likely not using anywhere close to everything they can to help speed up your requests. Want solutions? This advanced seminar is designed to give you a deep and thorough understanding of SQL Server's parallel processing capabilities, along with exactly why and how you can exploit things to massively improve performance -- by a factor of 10x or more in many cases.

    You'll learn about how Windows thinks about your CPUs, how SQL Server thinks about your CPUs, how SQL Server interfaces with Windows, how the query optimizer makes its parallel plan decisions, how the query processor interfaces with SQLOS, and how to leverage some basic Computer Science theorems to help the query processor do a much better job on your behalf. And you'll learn how to troubleshoot and take action when something goes wrong.

    It's a lot of deep material packed into a single day, and that's why it's 400 level. I want to pack your brain full of information. This is a seminar I've delivered a few times now, and I'm going to be making various enhancements for PASS -- including adding more content on Columnstores for SQL Server 2014, some of the new DMVs, and some interesting twists on a few of the techniques. Will be a fun day!


    Query Tuning Mastery: Manhandling Parallelism, 2014 Edition (400 Level Breakout Session)

    Where "Better Performance Through Parallelism" ends, "Manhandling Parallelism" takes over. This session will have only a tiny bit of overlap with the seminar, and pretty much assumes that you know everything I've taught in the full-day session. From that common starting point I'll attack the query optimizer and query processor from various angles, exploiting parallelism in new and interesting ways. You'll learn how to hack the optimizer's costing model, how to fix a less than ideal implementation in the storage engine, and how to take control using a series of specially designed CLR functions.

    This session is all about applied internals. You may or may not want to rush some of these techniques directly into your production queries; the idea is to show you what can be done if you push the envelope, help you learn some of what makes SQL Server tick, and hopefully have some fun with it.


    And as for the contradiction mentioned in the title of this post?

    Before we reject long running things, I'm going to ask you to embrace long running things. Because you will be the long running thing.

    SQL Long Run (400 Level Outside Run)

    You're about to sit in a convention center for a week filling your brain with technical information. You're about to attend a conference that's full of networking opportunities.

    Can you simultaneously kick off the networking while helping to counteract your forthcoming period of physical inactivity? Why yes, you can!

    SQL Long Run is a simple concept: strap on some running shoes and go for an 11-14 mile run with whatever SQL Server people are in Seattle and want to run with us on the morning of Sunday, November 2. (The day before PASS Summit precons begin.) Have fun, break a sweat, and chat with some fellow runners and SQL geeks.


    See you in Seattle!


  • PASS Summit Abstract Feedback

    Last week PASS announced its selections for its 2014 Summit. (I had a couple of abstracts selected, but I'll go into detail on those in a future post.)

    As usual, very little feedback was provided to speakers on why abstracts were or were not selected. But this time a number of people demanded action. And as it turns out, PASS was well-equipped to respond.

    The abstract review process PASS uses involves reviewers submitting comments on each abstract. Why PASS has never sent these out, and has actively refused in the past, is beyond me. But the good news is that clearer heads have prevailed. PASS announced yesterday that anyone who submitted for Summit could e-mail and request his or her comments. A huge step in the right direction!

    Naturally I e-mailed the moment I saw the announcement. The comments I received were interesting, but not always actionable. None the less I very much appreciate the ability to see them.

    Further, as someone who has written about abstract writing in the past, and who happily provides public feedback to others, it seems only fair that I now that I've received a bunch of feedback, I should post it for your review.

    So without further ado, here are my four submissions and their comments.


    Better Performance Through Parallelism [Level 400 Full-Day]
    Today's server hardware ships with a truly massive amount of CPU power. And while SQL Server is designed to automatically and seamlessly take advantage of available processing resources, there are a number of options, settings, and patterns that can be exploited to optimize parallel processing of your data. This full-day seminar starts with an overview of CPU scheduler internals from the perspective of both Windows and the SQL Server Operating System (SQLOS). Next, you will learn about intra-query parallelism, the method that SQL Server uses to break up large tasks into many smaller pieces, so that each piece can be processed simultaneously. You will come to understand how to read parallel query plans and how to interpret parallel wait statistics, as well as best practices for the various parallelism settings within SQL Server. Finally, you will be introduced to techniques for exploiting parallelism at the query level: patterns that can help the optimizer do a better job of parallelizing your query. After attending this seminar, you will be in full control of your CPUs, able to compel your server to use every clock tick to your end users' advantage.

    • "This is an extremely important topic for database professionals and the importance of it's role in sustainability can't be over stated - great topic" >> Thank you!
    • "Excellently written abstract, and excellent idea for a session. Seemed a bit narrower than I would like, but I have no doubt it would be an awesome, very deep dive session" >> Deep and fairly focused is in fact the idea here. I think that narrow focus is in fact a key attribute of 400-500 level sessions.
    • "Very well presented and has lots of potential for a full room." >> This is interesting abstract feedback. Did this reviewer attend a previous delivery? Or does the comment mean that the abstract itself is well presented?
    • "This could be a bit much for a for a full day pre-con" >> Now this is excellent feedback. I have delivered this particular seminar several times already, but I am going to be enhancing it for this year's PASS Summit. I will definitely think about this feedback and make sure I don't pack too much material in. Sometimes my seminars have been a bit overstuffed and attendees have found the final hour to be somewhat overwhelming as I rushed to finish in time. That's no fun for anyone and I'm actively working on doing a better job in that area.
    • "great topic - very inetrestting [sic]. well written abstract with good deal of details. clear goals. and good balance of demo %" >> I don't recall what I put for goals (PASS asks for three of them) but I'm glad I did the right thing here.


    Data, Faster: SQL Server Performance Techniques with SQLCLR [Level 400 Breakout]
    Sometimes it seems like business users are constantly more demanding: They want more data, more analytics, more insight, and the output needs to be faster, faster, faster. And so we database developers struggle to write ever more complex queries, tune our hardware, carefully choose indexes...and sometimes it’s just not enough. Where do you go when you’ve hit the performance wall? As more and more developers are discovering, a great choice is to make use of SQL Server’s CLR capabilities. User-defined aggregates, table valued functions, and user-defined types are more than just logical constructs; properly applied, they can make complex, logic-driven queries many times faster. This session will show you when and how to use these powerful tools in ways you may not have previously considered. You’ll learn how to speed up various types of analytical queries, do much faster XML processing, and build smart caches that will supercharge your user-defined functions. If you’re ready to take your SQL Server development skills to the next level, this session is definitely your starting point.

    • "really advance session on CLR - I canimagine [sic] this appealing to advanced users alot [sic]" >> Thank you!
    • "Not much to say, a very well written abstract and a great idea for a session." >> Thank you!
    • "I would definitely see this. I believe SQL CLR it is a hidden gem. This session seems a great fit for any developer." >> Thank you!
    • "Well written abstract and I have a good idea of what I would learn.  Level is appropriate.  Appeal to a wide audience is where I see the problem." >> Now things are getting interesting. Audience appeal. I agree! What I would absolutely love here would be a suggestion on how to better position this topic so that people will be more interested. That's not the PASS reviewer's job, naturally. But it would have been very nice for the reviewer to type just a few more words. (Hint, hint. Are you out there, reviewer?)
    • "Abstract good use some rework good topic for under used features" >> I'm a bit confused by this one. Is the abstract good or does it need rework? Again, I'd love to get more from this reviewer.
    • "very interesting topic. unique yet very relevant. clear well defined and relevant goals" >> Thank you!


    Query Tuning Mastery: Manhandling Parallelism, 2014 Edition [Level 400 Breakout]
    When it comes to driving ultimate performance for your biggest queries, parallelism is the name of the game. Don't let SQL Server decide when and where to apply its parallel magic; be your own query optimizer and take control! The 2012 edition of this talk--one of the most popular sessions at PASS Summit--discussed problems with parallel costing, intricacies of plan selection and execution, and the extremely effective Parallel CROSS APPLY pattern. But these things were just the beginning. Two more years of research have yielded new insights, new techniques, and new performance gains. In this talk you'll get a quick review of the 2012 content before diving into brand-new costing hacks, tricks for leveraging parallel scans, and a set of SQLCLR functions that will give you more control over intraquery parallelism than you ever thought possible. Get ready: It's time to Manhandle Parallelism all over again.

    • "This session has certainly seem a bit of adapation [sic] and I think that is onenof [sic] it's strength. I think this could be one the best attended sessions at PASS this year - it certainly has my vote." >> This feedback is very interesting, as it seems that the reviewer thinks that this is an adapted version of the previously delivered session. That was certainly NOT my goal -- this will be a brand-new session -- so here I have an action item. I've messed something up in the wording and I need to figure out how to make it clearer next time.
    • "Some of the abstract (like telling us it was popular before) are not really what I like to see in an abstract. I am also a bit concerned with the topic which suggests that we should typically take control over query optimization. With that said, it does seem like a fairly decent session idea." >> Another very interesting piece of feedback. Is marketing in an abstract a bad thing? I think it's the entire point of an abstract. But naturally not everyone agrees. Did I go overboard? I'm not sure. But I'll certainly give it some thought. Should you typically take control over query optimization? That's another interesting question. Perhaps I'll talk about it in the session!
    • "Good abstract. Seems like a very nice session for the summit. Quite important topic for anyone writing queries." >> Thank you!
    • "Well written abstract, goals, level, and additional data." >> Thank you!
    • "Well written with clear and concise goals." >> Thank you!
    • "very interesting and current topic - relevant. great abstract - good level of details and contents seem very interesting. clear and well defined goals" >> Thank you!
    • "Granted, presentations are updated all of the time, but I wonder at what point the updates prevent the use of the 'Previously Presented' option.  For the option to be selected, I would expect less than fifteen percent of the content has changed and any more would preclude its use." >> Aha. Another reviewer who has read my abstract to mean that this in merely an update of the 2012 session. One such feedback might have been an outlier. Two of them? There is definitely an issue with the wording here, and I need to give it some serious thought.

    The Need for Speed: Making Big Queries Faster [Level 400 Full-Day]
    Tuning large and complex queries is a special art, often requiring uncommon techniques that aren’t regularly taught in general-purpose performance courses and books. This full-day session is focused specifically on filling in those blanks, providing you with tools to handle your biggest, most important user demands: analytical, decision support, and reporting queries. You'll learn how to look at even the scariest of query plans and quickly figure out which areas are causing your headaches. You’ll learn how to fix the issues using indexing strategies, optimizer manipulations, and T-SQL rewrites. And you’ll learn to think about performance tuning in new and different ways. Attend this seminar to eliminate bottlenecks and make your biggest queries your fastest queries.

    • "great outcomes - great amount of demo's [sic] good abstract. Makes prerequisites clear. I'd attend." >> Thank you!
    • "There are other several sessions on this very same topic. This abstract doesn't indicate if it will cover 2014 or not and I think it is important at this point." >> Very interesting feedback. I thought that by drilling in and only covering "big" queries that the abstract would differentiate itself from all of the other "performance tuning" sessions that are submitted every year, but clearly that didn't happen. The 2014 feedback is a good point too. I didn't put that in on purpose. By the time abstracts were due, I had not yet touched 2014 in a production environment (and, full disclosure, I still haven't). I do not, as a general rule, present on topics that I haven't extensively worked with, so I couldn't put it into the abstract. I left it vague on purpose, hoping that perhaps over the summer I'd get enough production experience to add some 2014 material to the mix. I'm not sure if that was a good decision or not, but it is what it is.
    • "Thanks for the abstract. It’s good to have someone speaking about the Performance and tuning. The abstract content and the goals are quite interesting and It will be sure a great session in PASS Summit 2014. Thanks." >> Thank you!

    And that's that. Now I'll turn things over to you. Have any additional feedback for me? Post it below!

  • TechEd North America 2014 (Houston) - Demos - DBI-B489 and DBI-B333

    Just a quick note to thank everyone who attended both of my sessions at this year's TechEd show in Houston!

    Attached are demos for both the query tuning and window functions talks.

    Enjoy, and let me know if you have any questions.


  • SQL Saturday Chicago (#291) - Clash of the Row Goals - Demos

    Thank you to everyone who joined me today at SQL Saturday Chicago for my "Clash of the Row Goals" session. I'm honored that so many people would attend a talk with such an incredibly arcane title -- and always glad to find a curious, interested, and intelligent audience.

    The demos I showed during the session are attached to this post. If you have any followup questions, feel free to leave a comment below.


  • SQL Saturday Chicago...and the Gentle Art of Planscaping!

    Three weeks from now I'll be visiting Chicago to participate in SQL Saturday. I'll offer two different learning opportunities:

    On Friday, April 25, you can join me for a full day seminar on parallelism and performance in SQL Server. This class covers in-depth background on the hows and whys behind SQL Server's parallel processing, followed by an investigation of advanced techniques that you can use to improve performance of some queries by an order of magnitude or more. The class is filling up, so make sure to register soon!

    On Saturday, April 26, I'll deliver my newest talk, "Clash of the Row Goals." I unveiled it the first time this past weekend at SQL Saturday Boston, and Joe Chang happened to blog about it a few hours ago. In this talk you'll learn how to shape and modify query plans by leveraging an understanding of query optimizer internals -- without using query hints. I've decided to call this technique planscaping. The idea is to create beautiful query plans that perform well and don't break at unexpected times. That's a good thing, right?

    Hope to see you in Chicago!

  • Demos: SQL Saturday Boston (#262) - Clash of the Row Goals!

    Thanks to everyone who joined me yesterday afternoon in Cambridge for my Clash of the Row Goals talk. It was great to receive such a welcome reception for a rather out there topic.

    Three things I should clarify for those who attended:

    • The Many() function appeared to create an invalid estimate in one case; but that was due to auto-parameterization (which one of the attendees in the front row correctly called out at the time; but I'm not sure everyone heard her)
    • Regarding the demo that failed at the end: As it turns out -- it didn't! I was a bit under-caffeinated and misunderstood my own notes. I've updated the scripts to make things clearer.
    • I blamed the demo failure, jokingly, on Azure. Turns out at least one attendee took me seriously; it showed up on a feedback card. Fact is, I was running the entire show from my laptop. My head might have been in the clouds, but my demos were strictly on terra firma.

    The demo scripts are attached. Enjoy, and let me know if you have any questions or comments. I'd especially love to hear how it goes if you decide to try any of these techniques on one of your problem queries.

  • Query Performance and Parallelism Seminars in Boston and Chicago

    You're a DBA or database developer working on a big (or even big-ish -- doesn't have to be huge) database. Your end-users are doing all sorts of data analysis, and even though you have a pretty beefy server, your queries just don't seem to run as fast as they should.

    The reason (I bet)? Your queries aren't taking full advantage of the CPU power available in today's servers. SQL Server has powerful parallel processing capabilities, but left to its own devices (and defaults) you're never going to see the full potential. Luckily, this can be fixed. You can make your queries much faster. (Often an order of magnitude or more!) And I can show you how.

    Join me in Boston or in Chicago, the Friday before the upcoming SQL Saturday events in those cities, and we'll go deep into parallel processing. You'll learn about Windows server and how it deals with processes and threads, about SQLOS and its thread scheduling internals, how the SQL Server query processor works, and how you can use all of this knowledge to make your queries run faster and your server behave more consistently, even under load.

    These full-day seminars cost only $150 and include lunch, coffee, and a printed course book -- plus a full day of intense content.

    Full information, including an outline and registration details, is available at the links below:

    Hope to see you there!

  • SQLRally Nordic and SQLRally Amsterdam: Wrap Up and Demos

    First and foremost: Huge thanks, and huge apologies, to everyone who attended my sessions at these events. I promised to post materials last week, and there is no good excuse for tardiness. My dog did not eat my computer. I don't have a dog. And if I did, she would far prefer a nice rib eye to a hard chunk of plastic.

    Now, on to the purpose of this post...

    Last week I was lucky enough to have a first visit to each of two amazing cities, Stockholm and Amsterdam. Both cities, as mentioned previously on my blog, hosted SQLRally events, and I did a precon plus two talks at each event.

    The events in both cities were well done and the audiences seemed quite sharp. Very nice events at which to be a speaker! I hope to return to both areas very, very soon. (Hint, hint, event organizers!)

    Precon attendees: You should already know where to get your slides and demos.

    Breakout session attendees: Find the demos for both of my talks attached here.

    Thanks again to everyone who I met and/or taught. It was really a fantastic week. Skål and/or proost -- depending on your disposition!

  • Query Performance Sessions in Stockholm and Amsterdam

    As previously mentioned, I'll be delivering my full-day "No More Guessing" performance troubleshooting seminar at both the Stockholm and Amsterdam SQLRally events. In addition to the seminar, I'll be doing two breakout sessions at each event.

    It's going to be a busy week! But luckily for me, I'm doing the same things in each city:


    Using Your Brain to Beat SQL Server

    This session is a joint effort with one of the world's greatest SQL superheroes, Thomas Kejser. The session is all about query optimizer deficiencies and how to win the battle for excellent performance by applying your own intelligence instead of blindly trusting that SQL Server will do the right thing. I think this is going to be a really fun and useful session.


    Data, Faster: SQL Server Performance Techniques with SQLCLR

    Remember when SQLCLR was first put into SQL Server and everyone was wondering what they should use it for? I found my answer: raw performance. In this session I'll teach you a SQLCLR pattern that can help reduce the run time of your heaviest and most logic-filled queries, often by an order of magnitude or more.


    Looking forward to seeing you in Europe next month!

  • Hey Amsterdam and Stockholm: Just Say No to Guessing About Performance!

    Have SQL Server performance problems? What are you going to look at? Can you quickly isolate the root cause? Or do you ever find yourself stuck, not sure of what to do next?

    That's what the No More Guessing seminar is all about.

    This seminar teaches you to use:

    • Baselining: An analytical methodology based on statistical analysis and evidence rather than graphs and your eyes.
    • Performance counters. The right set of counters to get you where you need to be without being overwhelmed.
    • Wait stats: Including what waits really mean and an extremely in-depth overview of the 15 most common waits you will encounter on real production systems.
    • DMVs: The 15 DMVs that you can use to find out why things are slow on your server right now
    • Query plans: The top things to look for in your biggest, most confusing plans so that you can quickly find the culprit. And, naturally, how to fix the problem.

    All of these techniques will work with any version of SQL Server after 2005, although 2008 and beyond are recommended. (The seminar covers some specific very useful features added in both 2008 and 2012. As soon as 2014 is ready, I look forward to adding some really cool stuff from there as well. SQL Server 2005 just isn't as exciting as it once was.)

    This November I am pleased to be teaching this seminar twice in Europe, at the two SQLRally events:

    I'm also delivering two breakout sessions at each show. More information on those soon.

    Hope to see you in Europe!

  • Next-Level Parallel Plan Forcing: An Alternative to 8649

    "Even experts with decades of SQL Server experience and detailed internal knowledge will want to be careful with this trace flag.  I cannot recommend you use it directly in production unless advised by Microsoft, but you might like to use it on a test system as an extreme last resort, perhaps to generate a plan guide or USE PLAN hint for use in production (after careful review)."

    So wrote Paul White in his often referenced article, "Forcing a Parallel Query Execution Plan." His article focuses on the various reasons that you might not get a parallel query plan, chief among them the optimizer simply not doing a great job with its own cost model. (My session from the 2012 PASS conference, available on YouTube, also discusses the issue in some detail, and from a different perspective. You might want to both watch it and read Paul's article, prior to reading any further here.)

    The trace flag Paul mentions, 8649, is incredibly useful. It allows us to tell the query optimizer to disregard its cost-based comparison of potential serial and parallel versions of our plans, thereby skipping right to the good stuff: a parallel plan (when it's possible to generate one). Alas, the flag is undocumented, unknown, and its full impact difficult to understand. And for those cowboys who are willing to take the plunge and use it in a production environment, there are still other issues lying in wait:

    • This flag is not something you want to enable on a system-wide basis. You can use DBCC TRACEON on a case-by-case basis, but as Paul shows in his post it's much nicer to use the QUERYTRACEON hint. Unfortunately, this hint was, just like the flag, entirely undocumented until only recently. It is now documented only for a very small number of flags; 8649 isn't one of them.
    • Just like DBCC TRACEON, QUERYTRACEON requires significant system-level privileges: system administrator, to be exact. This means that you're forced to either give that permission to all of your users, or do some module signing. Clearly the latter approach is far superior, but it's still a pain.
    • QUERYTRACEON, being a query-level hint, can't be encapsulated in a view or inline table-valued function. So if we want to force these to go parallel, we're pretty much out of luck.

    Clearly, while invaluable for testing, 8649 just isn't the parallel panacea we need.

    Before we get into the details of the solution I'll be presenting here, it’s probably a good idea to do a quick review of the situation.

    Here’s the problem, in brief:


    Big plan, no parallelism. Ouch. Or maybe I should say that more slowly. OOOOOOOOuuuuuuuucccccccchhhhhhhh. Because you’re going to be waiting for quite a while for this plan to finish.

    Why did this happen? We can find out by forcing a parallel version of the plan using 8649 and evaluating the metadata.


    When the query optimizer decides whether or not to select a parallel plan, it first comes up with a serial plan cost, then compares it to the parallel plan cost. If the parallel plan cost is lower, that's the one that gets used. This parallel plan cost is based on the serial cost, but with a few key modifications:

    • CPU costs for each iterator are divided by the degree of parallelism...except when they sit beneath a parallel nested loop. (Except when the input to the parallel nested loop is guaranteed to be one row. That’s a very special parallelism edge case.) Notice the nested loop iterator above? Strike one. (Note: I/O costs aren’t divided by anything.)
    • Sometimes, the parallel plan will have slightly different iterators than the serial version, due to the optimizer doing its job. Notice, above, that the iterator feeding the nested loop has transitioned from a Merge Join to a Hash Match? That’s going to change the cost. In this case, it’s more expensive. Strike two.
    • Parallel iterators are added into the plan as necessary. See that Gather Streams? It’s not free. Strike three.

    Adding all of these modifications together, our parallel plan has a higher cost than our serial plan. Cost is the query optimizer’s way of weighing the relative merits of one plan against another, and just like you might choose the less expensive option at a store, so has the query optimizer. A plan with a cost 10 less than some other plan must perform better, right? Well…no. Not in the real world, and especially not when we're dealing with plans that have estimated costs in the millions.

    Unfortunately, as I mention in the video linked above, the costing model is rather broken. And the workaround I suggest in the video--using a TOP with a variable in conjunction with OPTIMIZE FOR--can work, but it has some problems. The biggest issue, as far as I’m concerned? It requires use of a local variable. Which, just like 8649, means that it can’t be used in a view or inline TVF.

    So what’s a SQL developer to do?

    Recently it hit me. If I could only create a query fragment that had certain properties, I could apply it as needed, just like 8649. Here’s what I set out to create:

    • High CPU cost, low I/O cost. This is key. The query fragment had to be able to benefit from the query optimizer’s math.
    • No variables. See above.
    • A single, self-contained unit. No tables or other outside objects. As much as possible, avoidance of future maintenance issues seemed like a good approach.
    • No impact on estimates in the core parts of the plan. This query fragment had to impact plan selection purely on the basis of cost, and without causing the optimizer to make strange choices about join types, execution order, and so on.

    After quite a bit of trial and error I arrived at my solution, which I encapsulated into the following table-valued function:

    CREATE FUNCTION dbo.make_parallel()
        a(x) AS
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1),
                    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)
            ) AS a0(x)
        b(x) AS
            SELECT TOP(9223372036854775807)
                a AS a1,
                a AS a2,
                a AS a3,
                a AS a4
                a1.x % 2 = 0
            SUM(b1.x) AS x
            b AS b1
            SUM(b1.x) IS NULL

    What this does: The function starts with a set of 1024 rows, defined in the row value constructor in CTE [a]. This set of rows is cross-joined to itself four times. The resultant Cartesian set contains 1,099,511,627,776 rows, all of which are forced to pass through a Top iterator as well as a Stream Aggregate. This is, naturally, a hugely expensive operation that generates a very high estimated cost.

    Except that in reality there are only 1024 rows. Notice the predicate in CTE [b]: (a1.x % 2 = 0). If you've studied a bit of math you know that 1 divided by 2 has a remainder of 1, not 0. But luckily for us, the query optimizer has no way of evaluating that at compile time. It instead asks its sibling, the query processor, to do that work. The plan involves scanning each of the four cross joined sets for a match, but of course no match is ever found. And since cross joining an empty set to any other set results in an empty set, the query processor has no need to scan any but the first set of rows it encounters. So at run time 1024 rows are touched, and that's that.

    Here's the catch (and it's a good one, for us): Since the query optimizer doesn't know that our predicate will never return true, it also doesn't know how many rows will actually be touched. And it's forced to assume the worst, that the full cross product will be processed. Therefore, this function delivers a rather large cost estimate of just under 1,000,000. (Serial cost; parallel will be less, due to the aforementioned math.) This number, as a cost goal, is somewhat arbitrary. I originally came up with a function that delivered a cost in the billions, but it added a lot of complexity to the query plans I was working with. So I scaled it back a bit. 1,000,000 should be fine in just about every case. The example above is fairly typical; I usually see these cost problems crop up with very small relative differentials between the serial and parallel plan cost. The really important thing is that 100% of the estimated cost from this function is CPU time. That means that we can take full advantage of the way the optimizer works.

    Of course, almost none of the cost is real. This UDF will add a couple of dozen milliseconds to your plan. It will also add around 100ms to the compile time. In my opinion, that doesn’t matter. If you’re playing in the big parallel workload arena you’re not doing thousands of batch requests a second. You’re trying to get your two hour query down to a reasonable amount of time. No one is going to care about a dozen milliseconds.

    This function is also engineered so that the output number of rows is guaranteed to be no greater than one. See that SUM, with no GROUP BY? The query optimizer knows that that can’t return more than one row. And that’s a good thing. It gives us that parallelism edge case I mentioned above. (Nested loop with an input guaranteed to be exactly one row.) Another thing? No rows will ever actually be aggregated in that SUM. Its result will always be NULL. But the query optimizer has no way of knowing that, and it comes up with a plan where the entire backing tree needs to be evaluated. That’s why the HAVING clause is there.

    Using this function is quite simple. You take your query, wrap it in a derived table expression, and CROSS APPLY into it. No correlation required. Let’s pretend that the query we want to force parallel looks something like this:

    FROM TableA AS a
        a.PK = b.PK

    Using the make_parallel function to force this to go parallel is as simple as:

    FROM dbo.make_parallel() AS mp
        FROM TableA AS a
        INNER JOIN TableB AS b ON
            a.PK = b.PK
    ) AS x

    The reason we CROSS APPLY from the function into the query is to keep the high cost on the outer side of any parallel nested loops. This way, the query optimizer’s parallelism math will work it’s magic the right way, yielding the parallel plan we’re after. CROSS APPLY in this case—uncorrelated—can only be optimized as a nested loop itself, and that loop only makes sense if there is at least one row feeding it. Therefore, this query is logically forced to process the TVF first, followed by the inside of table expression [x].

    Note that just as with trace flag 8649, there are a number of parallel inhibitors that are still going to keep this from working. And note that unlike when using the trace flag, the base cost of just under 1,000,000 means that even if you have predicates in certain cases that make a parallel plan less than ideal, you’re still going to get a parallel plan. Using this function is effectively applying a big huge hammer to a problematic nail. Use it with caution, make sure it’s appropriate, and don't bash your thumb.

    So what does a query plan look like, once this has been applied? Here’s the same query from the screen shots above, with the function in play:


    Hopefully (without squinting too much) you can see the key attributes: There is a new subtree on top of the original query. That’s thanks to the TVF. This subtree is evaluated, and feeds the top Distribute Streams iterator, which then feeds the Nested Loops iterator. Since the input to that Distribute Streams is guaranteed to be one row, it uses Broadcast partitioning. But there is no correlation here, so there is really nothing to broadcast; the net effect of this action is to prepare a set of threads to do the work of processing your query in parallel.

    Under the Nested Loops? The same exact plan shape that was produced when I used the trace flag. Same estimates, same iterators, and so on and so forth. The UDF necessarily impacts the plan as a whole, but it works as intended and does not impact any of the parts of the plan we actually care about -- the parts we need to make faster in order to get data to our end users.

    I’m really excited about this technique. It means that I can play fewer games, worry less about privileges and the potentially negative impact of using undocumented hints, and concentrate more on the data I need the query to return than the physical manner in which it’s being processed.

    Bear in mind that there are myriad other problems with SQL Server’s parallel processing capabilities. This technique merely solves one of them, and in all honesty it’s neither the cleanest nor best solution that I can imagine. It’s a start. My hope is that these issues will eventually be addressed by the query optimizer and query processor teams. In the meantime, we as end-users have no choice but to continue to push the product in an effort to get the best possible performance, today, for the queries that we need to run today. Tomorrow, perhaps, will hold a better story.

    Disclaimer: This technique works for me. It may not work for you. It may cause some problem that I haven't yet considered. It may cause your server to literally explode, sending fragments of plastic and metal all over your data center. The function and technique are provided as-is, with no guarantees or warranties, and I take no responsibility for what you do with them. That said, if you DO try this out I'd love to hear about your experience, in the comments section below!


    Special thanks to Gokhan Varol for helping me test this technique.

This Blog


Privacy Statement