There has been relatively litle activity in TPC Benchmarks recently with the exception of the raft of Dell TPC-H results with Exa Solutions.
It could be that systems today are so powerful that few people feel the need for benchmarks.
IBM published an 8-way Xeon E7 (Westmere-EX) TPC-E result of 4593 in August, slightly higher
than the Fujitsu result of 4555, published in May 2011.
Both systems have 2TB memory. IBM prices 16GB DIMMs at $899 each, $115K for 2TB or $57.5K per TB. (I think a 16MB DIMM was $600+ back in 1995!)
The Fujistu system has 384 SSDs of the 60GB SLC variety, $1014 each,
and IBM employed 143 SSDs of the 200GB eMLC variety, $1800 each for 24-28TB raw capacity respectively.
Except for unusually write intensive situations, eMLC or even regular MLC is probably
good enough for most environments.
HP published a TPC-H 1TB of 219,887.p QphH
for their 8-way ProLiant DL980 G7 with the Xeon E7-4870,
26% higher in the overall composite score than the IBM x3580 with the Xeon E7-8870 (essentially the same processor).
The HP scores 16% higher in power and 37.7% higher in throughput.
Both throughput tests were with 7 streams.
The HP system had Hyper-Threading enabled (80 physical cores, 160 logical)
while the IBM system did not.
Both systems had 2TB memory, more than sufficient to hold the entire database, data and indexes in memory.
The IBM system had 7 PCI-E SSDs and
the HP system has 416 HDDs over 26 D2700 disk enclosures, 10 LSI SAS RAID controllers,
3 P411 and 1 dual-port 8Gbps FC controller.
Also of interest are TPC-H 1TB reports published for the 16-way SPARC M8000 (June 2011)
with SPARC64 VII+ processors and the 4-way SPARC T4-4 (Sep 2011).
The table below shows configuration information for recent TPC-H 1000GB results.
|TPC-H 1000GB||IBM x3850 X5||HP ProLiant DL980 G7||IBM Power 780||SPARC M8000||SPARC T4-4|
|DBMS ||SQL 2K8R2 EE||SQL 2K8R2 EE||Sybase IQ ASE 15.2||Oracle 11g R2||Oracle 11g R2|
|Processors||8 Xeon E7||8 Xeon E7||8 POWER7||16 SPARC64 VII+||4 SPARC T4|
|Cores Threads ||80-80||80-160||32-128||64-128||32-256|
|IO Controllers ||7||13||12||4 Arrays||4 Arrays|
|HDD/SSD||7 SSD||416 HDD||52 SSD||4x80 SSD||4x80 SSD|
The figure below shows TPC-H 1000GB power, throughput and QphH composite scores for 4 x Xeon 7560 (32 cores, 64 threads),
two 8 x Xeon E7 (80 cores, 80 and 160 threads) systems, 8 x POWER7 (32 cores, 128 threads)
16 SPARC64 VII+ (64 cores, 128 threads) and the 4 SPARC T4 (32 cores, 256 threads).
TPC-H SF 1000 Results
The HP 8-way Xeon and both Oracle/Sun systems, one with 16 sockets
and the newest with 4 SPARC T4 processors, are comparable, within 10%.
An important point is that both Oracle/Sun and the IBM Power systems are configured with 512GB memory
versus 2TB for the 8-way Xeon E7 systems, which enough to keep all data and indexes in memory.
There is still disk IO for the initial data load and tempdb intermediate results.
This good indication that Oracle and Sybase have been reasonably optimized on IO, in particular,
when to use an index and when not to.
I had previously raised the issue that the SQL Server query optimizer should consider
the different characteristics of in-memory, DW optimized HDD storage (100MB/s per disk sequential)
Sun clearly made tremendous improvements from the SPARC 64 VII+ to the T4,
with the 4-way new system essentially matching the previous 16-way.
Of course, the Sun had been lagging at the individual processor socket level until now.
The most interesting aspect is that the SPARC T4 has 8 threads per core.
The expectation is that server applications have a great deal of pointer chasing code,
that is: fetch memory which determines next address to fetch with inherently poor locality.
A modern microprocessor with core frequency 3GHz corresponds to a 0.33 nano-second clock cycle.
Local node memory access time might be 50ns, or 150 CPU-clocks.
Remote node memory acess time might be 100ns for a neighboring node to over 250ns for multi-hop nodes
after cache-coherency is taken into account.
So depending on how many instructions are required for each non-cached memory access,
we can expect each thread or logical core to have many dead cycles, possibly enough to justify 8 threads per core.
What is surprising is that Oracle published a TPC-H benchmark with their new T4-4
and not a TPC-C/E which is more likely to emphasize the pointer chasing code than DW.
Below are the 22 individual query times for the above systems in the power test (1 stream).
TPC-H SF 1000 Queries 1-22
Below are the 22 individual query power times for just the two 8 Xeon E7 systems.
Overall, the HP system (with HT enabled) has 16% TPC-H power score, but the IBM system without HT
is faster or comparable in 9 of the 22 queries.
Not considering the difference in system architecture, the net might be attributed to HT?
TPC-H SF 1000 IBM and HP 8-way Xeon E7
Below are the 22 individual query power times for the HP 8 Xeon E7 and Oracle SPARC T4-4 systems.
TPC-H SF 1000 8-way HP Xeon E7 and 4-way SPARC T4