In commenting on my previous post, Greg Linwood and GrumpyOldDBA raised questions about various implications of parallelism. In this post, I’ll look at the impact of different degrees of parallelism on the query performance. I’ll limit my examination on the same query that uses a Cartesian product temp table, as discussed in the previous two posts ([1] and [2]). The details of the test setup can be found in [1]. The SQL Server instance is SQL Server 2008 R2 (10.50.1600) Enterprise X64 Edition. I’ll look at the performance impact of concurrent executions in a future post.
The law of diminishing marginal return dictates that as you increase the number of processors in a parallel plan, sooner or later you’ll reach a point after which you’ll see a decrease in the performance gain from adding another processor to the parallel plan. Note that it is the decrease in the performance gain, not the decrease in the performance itself.
The question is, for a specific query, does SQL Server know where that inflection point is and therefore make the right decision in picking the optimal degree of parallelism? SQL Server Books Online thinks so. It says:
“When SQL Server runs on a computer with more than one microprocessor or CPU, it detects the best degree of parallelism, that is, the number of processors employed to run a single statement, for each parallel plan execution.”
But how does that work out in practice? This is a valid question to ask because determining the optimal degree of parallelism is an incredibly difficult problem to solve. We obviously can’t answer this question in generality. Nor should we even attempt.
So let’s take a look at the following query (reproduced from the previous post, and see that post for more details):
select id_server_config,
id_thread_name,
id_tran_type,
id_tran_status,
id_users,
id_cpus,
COUNT(*) as cnt
from tran_stats t1 join #tmp t2
on t1.server_config = t2.server_config and
t1.thread_name = t2.thread_name and
t1.tran_type = t2.tran_type and
t1.tran_status = t2.tran_status and
t1.users = t2.users and
t1.cpus = t2.cpus
group by id_server_config,
id_thread_name,
id_tran_type,
id_tran_status,
id_users,
id_cpus
Let’s run this query on a server with 80 processors with the max degree of parallelism (maxdop) set to different values, and see it performs. (Note that the tran_stats table has 200,000,000 rows, and the query runs with all the data already cached in memory). The following chart summarizes the result:
A couple of things stand out on this chart.
First of all, the diminishing marginal return on parallelism is no mistake. After maxdop = 8, the additional performance gain by increasing degrees of parallelism is rather small. And after maxdop = 24, the gain is virtually non-existent. Arguably, the optimal degree of parallelism for this query on this server is somewhere between 8 and 24.
The second salient point is that when maxdop is set to 0, SQL Server sets the actual degree of parallelism to 64 when running the query. As per Books Online, 64 is the maximum value SQL Server would set maxdop to.
Why is setting maxdop to 64 not optimal for this query? Let’s look at the following chart that shows the % total CPU usage on the server (with 80 processors) at various maxdop settings:
For this query, as we increase the maxdop setting, the total CPU usage steadily goes up. Even though after maxdop = 24, we practically gains no performance, the total CPU usage still goes up. We just end up wasting a tremendous amount of CPU cycles for nothing.
Obviously we need to gain more experience to be sure, but it doesn’t look like a good idea to leave maxdop at 0 on a box with a lot of processors.