There are a lot of questions on hyperthreading, but not a lot of answers. There is no shortage of opinions, but very few are based on significant first hand experience or solid test data.
We know that the hyperthreading technology in the older generations of the Intel processors was not well received by the SQL Server community. Hyperthreading in the newer generations of the Intel’s Nehalem-based Xeon processors, however, is decidedly better implemented, and it appears to be much better received by the SQL Server community.
I have done some tests with the Intel Westmere-EX processors, exploring how its hyperthreading feature may influence various types of SQL Server workloads. And my experience from these tests is generally positive. In fact, I have not seen any significantly negative performance impact that could be directly attributed to enabling hyperthreading. I plan to post some of my empirical observations. The intent is not to settle any issues. Rather in keeping with my own tradition, it is to contribute some data points to this very important subject.
In this post, I focus on the performance impact of enabling hyperthreading on reporting queries. Since there is a large number of variables that could heavily influence the results of any hyperthreading related test, quite a bit of attention is paid to ensure repeatability of the test results.
The tests are set up as follows:
- The tests use a single common query (shown below). The focus is not on the single query performance, but how streams of this query perform.
- There are two key test variables:
- Degree of parallelism. It is controlled with the option (maxdop) clause appended to the query.
- Number of concurrent query streams. It is controlled through a C# program by spawning as many threads, each of which drives the query in a serial loop for a configurable amount of time.
- There are three hyperthreading (HT) test configurations:
- 40 core with HT. That is 80 logical cpus. HT is enabled in the BIOS.
- 40 cores without HT. That is 40 logical cpus. HT is disabled in the BIOS.
- 20 cores with HT. That is 40 logical cpus. HT is enbaled in the BIOS.
- The query execution plans are logged by the test driver and verified to be constant across each test run and each query execution.
- The performance statistics of each query execution, regardless of its maxdop setting or which stream it is in, is captured by the test driver and loaded into a table for analysis and overall metric calculation.
- The test database consists of a 200-million-row main table tran_stats, exactly as described in this previous post. The SQL scripts for creating and populating this table are in the attached zip file. The test database also includes a dimension table which basically is the #tmp table in this same previous post. The scripts to create and populate this dimension table are also included in the attached zip file.
- The test results are summarized with a query processing throughput metric, Queries per Hour, representing the number of queries processed in an hour for the given test configuration.
- All tests are run while the data are fully cached. The storage subsystem is not a factor in these tests.
Essentially, with this test workload we want to see how it behaves when we increase the load level by increasing the number of concurrent query streams for a given maxdop setting, with and without hyperthreading on a DL580 G7 with four E7-4870 (10 core) processors and 264GB of RAM. The OS is Windows 2008 R2 Enterprise x64 Edition with SP2 and DBMS is SQL Server 2008 R2 Enterprise x64 Edition (10.50.1600).
The main test query (without the maxdop option clause) is as follows:
select COUNT(*) from (
COUNT(*) as cnt
from tran_stats t1 join dimension t2
on t1.server_config = t2.server_config and
t1.thread_name = t2.thread_name and
t1.tran_type = t2.tran_type and
t1.tran_status = t2.tran_status and
t1.users = t2.users and
t1.cpus = t2.cpus
group by id_server_config,id_thread_name,id_tran_type,id_tran_status,id_users,id_cpus
) as v
I pick this particular query because it is a rather plain reporting query. Nothing exotic and something you’d see often in the real world. I also happen to be playing with it recently for other purposes.
By the way, in the spirit of full disclosure, if there is interest in the actual test driver program, I can post it here.
So, how do the results look like?
40 cores with HT vs. 40 cores without HT
For maxdop = 2, the query throughput results are summarized in the following chart.
Most notably, when the load level is stressed with 64 concurrent streams, the system manages to push through ~478 queries per hour with HT. Compare that to ~374 queries per hour without HT at the same load level. That is about 28% increase in the processing power (for this specific configuration, always remember that!). That’s a substantial improvement.
Also note that for streams 1 through 16, there is little to no difference whether or not HT is enabled. That is expected. After all, if the queries are not competing for processors, whether you have 40 or 80 processors should not make a big difference. This also means that if you just run your test with a single query, you probably won’t see any difference, one way or another. And if you draw a conclusion based on that single query result, the conclusion is likely to be misleading.
The next three charts show the results for maxdop=4, 8, and 20, respectively. The results look similar to that for maxdop=2 in terms of the general trend and pattern, though the extent of performance improvement varies. The improvement is generally between 8% and 25%.
40 cores without HT vs. 20 cores with HT
In this case, they both appear as 40 logical processors at the OS level and to the SQL Server instance except that one has 40 cores and the other has 20 cores. So which configuration should have better performance capacity?
You probably would pick 40 cores without HT. And you’d be correct. The following charts are the supporting evidence.
When pushed, 40 cores without HT outperforms 20 cores with HT substantially by as much as ~48%.
So overall in these tests, the performance impact of hyperthreading on the query throughput is positive, and in some cases very substantial.
To reiterate, I’d be careful--if at all--extrapolating the observations here. Hopefully, though, as the evidence accumulates from the community, we can have a better comfort level in how we use this feature effectively.