Earlier, JK blogged and ranted about buying bigger hardware.
Let me assure people that talent for highly efficient and scalable (over many processor cores) was drained long ago, and not just from the recent emergence of multi-core processors. For that matter, I am not sure it was ever prevalent because developers built code on small single socket (and single core in the old days) boxes without consideration for the issues that only occur in SMP or NUMA systems. Of course, it is necessary a bad thing that a person of modest skills can build an application, so long as it is rebuilt before moving to the big time.
But we should look at the flip side, from the point of view of both the software vendor (who pays for developers to write code) and the CIO (of the company that pays the sys admin and DBA to maintain the production servers). The high level people responsible for running a business or department are almost never (exception for Bill G and a few others) proficient coders. To rise in business or administration, over the course of time, they learn to assess value on different metrics. The simplest metric is money, and related, head count.
Let’s start with the ISV that sells software that has high (perceived) value. Suppose the probable full deployment cost might be on the order of $10M (£, € or ¥). The software vendor might want to target a price of $1M for the software license, plus 10% per year for support. Now if this software were super efficient it might run on a 2 socket database server costing $10K each, (two servers in cluster + 2 for the DR cluster totaling $40K) plus storage, and a $4K web server (again, 2 for redundancy, plus 2 more for DR).
The CIO will then ask: why am I paying $1M for software that runs on $10K database server and a $4K web server. Now suppose if this software were grossly inefficient that it would require an 8-way database server costing $200K each (a total of 4 for a 2-node cluster + another pair for DR) and 10 web servers (plus another 10 for DR).
Well then, now it seems perfectly reasonable that the software price is $1M. If the application still did not run well, then it certainly justifies having lots of consultants on permanent assignment to make sure it works.
Now let’s look at this from the CIO point of view. Suppose your company has two or more enterprise wide database applications, each administered by a separate DBA. You being a super proficient SQL Server performance expert, enable your database to run on the 2-way quad-core system. The other guy, has no such skills, proposed and deployed an Oracle RAC solution spanning eight 4-socket quad-core servers (The intent here is not to pick on Oracle, Larry is exceptional at making money and has a really nice yacht to boot). Who will know that with proper tuning, it would have also run on a single 2-socket system (this could still be a 2 node RAC for redundancy).
When it comes time for the annual review, which DBA will rate higher, the one that maintains a simple $200K (hardware + database licensing) environment or the one that handles the complex $2M environment? The CIO and HR know that the other DBA has a complex environment because of how long it took to get set up and all the very expensive consultants that had to be hired to do it. You did yours without expensive outside help.
From the CIO and HR point of view, what is an appropriate pay scale for each DBA? Do either of the CIO or HR people have the technical knowledge to put a value on what you did to save the company money? Or are they using other metrics?
If you think all this is highly irrational, take a long look at how things work in your organization and comment.
Anyways, below is my latest hardware list for consideration and comparison.
2-socket systems
|
Vendor |
Dell |
Dell |
Dell |
HP |
HP |
|
Model |
PowerEdge 2900 |
PowerEdge T610 |
PowerEdge T710 |
ProLiant DL370G6 |
ProLiant DL385G6 |
|
CPU Series |
Xeon 5400 |
Xeon 5500 |
Xeon 5500 |
Xeon 5500 |
Opteron 2400 |
|
Architecture |
Core 2 |
Nehalem |
Nehalem |
Nehalem |
Istanbul |
|
Cores/Socket |
4 |
4 |
4 |
4 |
6 |
|
Hyper-Thread |
No |
Yes |
Yes |
Yes |
No |
|
DIMM sockets |
12 |
12 |
18 |
18 |
16 |
|
IOH |
5000P? |
5520 |
2 x 5520? |
2x5520 |
|
|
PCI-E |
Gen 1 |
Gen 2 |
Gen 2 |
Gen 2 |
? |
|
x16 |
|
|
1 |
2 |
|
|
x8 |
1 |
2 |
4 |
1+1 (NIC) |
2 |
|
x4 |
3 |
3+1 |
1+1 |
6 |
4 |
|
PCI-X |
2 |
|
|
|
|
|
Int. HDD |
8+2 |
8 |
8/16 |
6+6+2 |
6/16 |
|
|
|
|
|
|
|
|
Configuration |
|
|
|
|
|
|
Price |
$4,537 |
$5,546 |
$5,417 |
$8,809 |
$6,858 |
|
CPU |
2xE5440 |
2xX5550 |
2xX5550 |
2 X5550 |
2x2435 |
|
Memory |
12x4 GB |
12x4 GB |
12x4 GB |
12x4GB |
12x4GB |
4-socket systems (and one 8)
|
Vendor |
Dell |
Dell |
HP |
HP |
HP |
|
Model |
PowerEdge R900 |
PowerEdge R905 |
ProLiant DL580G5 |
ProLiant DL585G6 |
ProLiant DL785G6 |
|
CPU Series |
Xeon 7400 |
Opteron 8300 |
Xeon 7400 |
Opteron 8400 |
Opteron 8400 |
|
Architecture |
Dunnington |
Istanbul |
Dunnington |
Istanbul |
Istanbul |
|
Cores/Socket |
6 |
6 |
6 |
4 |
6 |
|
Hyper-Thread |
No |
No |
No |
No |
No |
|
DIMM sockets |
32 |
32 |
32 |
32 |
64 |
|
IOH |
7300 |
|
|
|
|
|
PCI-E |
Gen 1 |
|
|
|
|
|
x16 |
|
|
|
|
3 |
|
x8 |
4 (2x2) |
2 |
|
3 |
3 |
|
x4 |
3 |
5 |
|
4 |
5 |
|
PCI-X |
2 |
|
|
2 |
|
|
|
|
|
|
|
|
|
Configuration |
|
|
|
|
|
|
Price |
$20,236 |
$16,437 |
$26,268 |
$23,570 |
$57,285 |
|
CPU |
4xX7460 |
4x8435 2.6GHz |
4xX7460 |
4x8439 |
8x8439 2.8GHz |
|
Memory |
32x4 GB |
32x4GB |
32x4GB |
32x4GB |
64x4GB |
big iron
|
Vendor |
HP |
Unisys |
Unisys |
NEC |
|
|
Model |
ProLiant DL785G6 |
ES7000 7600R |
ES7000 7600R |
Express5800
A1160 |
|
|
System Price |
$48,997 |
$66,729 |
$135,003 |
$145,596 |
|
|
Memory Price |
+8,288 |
+19,136 |
$46,376 |
$50,344 |
|
|
CPU |
8x8439 2.8GHz |
8xX7460 |
16 x X7460 |
16 x X7460 |
|
|
Memory |
64x4 (256GB) |
256GB |
512GB |
512GB |
|
2-socket systems
The Dell PowerEdge 2900 is for comparison with the previous generation. Dell came out first in tower chassis with the T610, which did not seem to be the replacement for the 2900, especially considering that there was a R710 in 2U. Just recently, Dell released the T410 and T710 filling out the Xeon 5500 series 2-socket tower chassis lineup. There is only a very small price difference between the T610 and T710 at the base model. When configured with dual power supplies, then T710 is actually slightly less expensive than the T610.
Earlier, I said I liked the ProLiant ML/DL370G6 because it implemented two 5520 IOH devices for 72 PCI-E gen 2 lanes. But I did not like the 2 x16 slots because database servers cannot really use these extra-wide slots. A combination like 7 x8 + 4 x4 slots would have been better. Also, using one of the x8 slots for the 4 included GbE ports is a waste. This should have occupied a x4 slot, or better yet, just implement the pair of GbE ports on the ICH, which attaches off the ESI port instead squandering a valuable PCI-E slot. We can then use the PCI-E slots for our choice of IO.
The T710 appears to have 56 PCI-E lanes configured (1x16+4x8+2x4) but there is no documentation that actually says the T710 implements two of the 5520 IOH devices. Still, 5 wide slots (1x16 + 4x8) are better than 2 on the T610, but 6 x8 would have been better. Let the workstation people have the x16 slots. A pair of available x4 slots would be nice too (one is used by the internal storage controller).
Finally there is the ProLiant DL385G6, which supports the new six–core Istanbul. This makes the DL385G6 a really powerful web server, but I would prefer more IO bandwidth for databases. Also, I do not know if the PCI-E slots are Gen1 or Gen2.
The Dell R805 now also supports the six-core Opteron 2400 series. Price with 2 x 2435 (2.6GHz) and 32GB memory is $3734 (probably another $460 to bring it to the 48GB reference used above, because the 2900 has 12 DIMM slots).
4-socket systems
The 4-way landscape will change when the Nehalem-EX (Intel Xeon 7500?) systems come out. For now, we have Intel Xeon 7400 series and AMD Opteron 8400, both at 6 cores. Previously, I criticized Intel for being myopic in obsessive focus on the 4-way platform, i.e., not making the 6-core Dunnington available in 2-way systems. In earlier generations, the 4-way processor had nothing special over the 2-way, and was usually 1 year behind. Anyways, AMD is first to reach 6-core in 2-way. Hopefully, with the powerful 8-core Nehalem-EX, there will be 2-socket systems. For a long time, Microsoft has recommended 4-way as the default choice for database servers. I think this should now be a 2-way, once we get to 6-core or more.
This system should handle most loads. And even if it turns out that a larger system is needed, the 2-way didn’t cost much and can always be used for other purposes. When consolidating small databases, a few 2-way systems is more flexible than one big system.
Big Iron
The big iron systems are shown in comparison the 8-way ProLiant 785. Notice that there is only a slight price premium going from two 4-way Opteron systems to one 8-way system ($47K and $57K respectively). It used to the premium was much larger. The price on Unisys and NEC systems appear to be about $33-37K for each 4 socket node. It used to be that a 4-socket node in the big-iron systems was around $80-90K. Right now we cannot really use the full power of the 16-socket system with the six-core Xeon 7400, because Windows Server 2008 can only support 64 cores. Soon R2 will out, and we can see how SQL Server scales.
However, I really think we need the better QPI interconnect technology of the Nehalem-EX to properly benefit from 128+ cores.
For a long time AMD crowed about how the Opteron HT interconnect and memory bandwidth scaled with the number of sockets, while Intel was constrained by the FSB. Yet the largest Opteron system was the 8-socket from HP and Sun. Even though the ProLiant 785 was very impressive on TPC-H, yet strangely there have been no published TPC-C or TPC-E results. Now with Istanbul, AMD mentions they have added HT-assist, similar in function to the snoop filter on Intel chipsets. Without this, there is excessive traffic to maintain cache coherency. This is not a simple matter and I expect it will take AMD an iteration or two to work out the bugs. Intel had similar difficulties with the snoop filter in their chipsets.