As I wrote that title, which was supposed to be a cute reference to parallelization, I realized it could also be the title for a blog post about my most recent trip, with delays, cancellations, missed connections and lost luggage both coming and going. But that will be another blog post that I will have to write about later, because the trip isn't even over yet. I am enroute from Philadelphia to Denver, 6 miles up, while I am writing this.
Parallelization is a frequently discussed topic when dealing with SQL Server Performance, and one of the most heretofore confusing issues is relationship between the configuration option 'max degree of parallelism' and the query hint MAXDOP. Neither of these options forces SQL Server to use a specific number of processors, unless of course you set either value to 1. Any value greater than one just gives SQL Server an upper limit, but during execution, a smaller number can be used depending on the existing workload.
The confusion I want to discuss now is what does the MAXDOP hint really allow? MAXDOP can be used in both queries and index building operations to set the upper limit on processor use, but I had received conflicting reports from various people about whether the MAXDOP hint could exceed the configuration option or whether or it could be used to only limit it still further. The last response I had received from my contacts at Microsoft had indicates that using the MAXDOP hint would not allow SQL Server to exceed the serverwide configuration value, i.e. the MAXDOP hint could be used to set a lower limit for processor use but it could never increase your SQL Server's configured upper limit.
One of my students last week wanted to figure out a way to not allow any queries to run in parallel, but to allow parallelism for index builds. If hints couldn't be used to exceed the serverwide 'max degree of parallelism', they would have to configure the server for a high DOP, and then hint every single query for MAXDOP 1; and this solution was not acceptable. Of course I agreed that it was unacceptable, so I thought I would revisit the question. I sent another email off to my favorite MS contacts, and while waiting for a response, I decided to test it out myself.
Although my laptop only has a single CPU, there is a traceflag you can use to simulate multiple CPUs. Using the SQL Server Configuration Manager, you can add the startup parameter -P#, where # is the number of processors you want to simulate, and SQL Server will create that number of schedulers. I usually run my machine with -P4. I used a script provided by my friend Lara Rubbelke, a performance consultant with Digineer, in Minneapolis. It uses a database called Performance created by Itzik Ben-Gan that you can download from his site at www.insidetsql.com.
SQL Server 2005 provides a way for you to observe the number of CPUs used during execution when you look at an actual plan instead of an estimated plan. Actual plans, produced by SET STATISTICS PROFILE ON or SET STATISTICS XML ON, involve executing the query, while estimated plans, produced by the SET SHOWPLAN options, do not run the query. If you look at the actual graphical execution plan you can also see the number of processors actually used by looking at the details for the very first (leftmost) icon in the query plan. You can enable this option with Cntl-K.
So I set my serverwide setting to a maximum of 2 CPUs:
EXEC sp_configure 'max degree of parallelism', 2
and then I ran this query batch:
SET STATISTICS XML ON
SELECT COUNT(*), ShipperID, EmpID
FROM Orders o INNER JOIN Customers c
ON o.custid = c.custid
WHERE c.custname LIKE 'Cust_1008%'
GROUP BY ShipperID, EmpID
OPTION (MAXDOP 4)
I saw the following details in my XML plan output:
<QueryPlan DegreeOfParallelism="4" .......
I then tried building an index and specifying MAXDOP:
CREATE INDEX IX_custid ON Orders(custid)
WITH (MAXDOP = 4)
I again saw the actual number of processors used was 4.
I finally got the 'official' answer from Microsoft, after my test was completed, which verified what I had observed. Here is the algorithm used to determine the Degree of Parallelism (DOP) used at runtime to execute your query:
1. Set DOP to the value of the MAXDOP query_hint if there is one, otherwise set the DOP to the 'max degree of parallelism' value.
2. Set DOP to the minimum from (DOP, # of schedulers).
3. Adjust DOP due to workload; it is possible to use a lower DOP if your server is loaded, i.e. there are not enough threads to run the query in the specified DOP. In the worst case SQL Server will run the query in serial.
Basically the same algorithm is also used for index builds.
Now back to the blog title... the engineer at Microsoft who developed the original parallel query implementation was not a native English speaker. When he talked about this great new feature he had developed, he sometimes wanted to use the word 'parallel' as a verb, and in his great enthusiasm he would sometimes leave off a syllable. So instead of saying 'We are going to be able to PARALLELIZE your queries', he would say 'We are going to be able to PARALYZE your queries'!