THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Laptop for database performance consultants

Today, it is actually possible to build a highly capable database system in a laptop form factor. There is no point to running a production database on a laptop. The purpose of this is so that consultants (i.e., me), can investigate database performance issues without direct access to a full sized server. It is only necessary to have the characteristics of a proper database server, rather than be an exact replica.

Unfortunately, the commercially available laptops do not support the desired configuration, so I am making an open appeal to laptops vendors. What I would like is:

1) Quad-core processor with hyper-threading (8 logical processors),
2) 8-16GB memory (4 SODIMM so we do not need really expensive 8GB single rank DIMMs) 
3) 8x64GB (raw capacity) SSDs on a PCI-E Gen 2 x8 interface (for the main database, not the OS)
- alternatively, 2-4 x4 externally accessible PCI-E ports for external SSDs
- or 2 x4 SAS 6Gbps ports for external SATA SSDs 
4) 2-3 SATA ports for HDD/SSD/DVD etc for OS boot etc
5) 1-2 e-SATA
6) 2 1GbE

Below is a representation of the system, if this helps clarify.
SandyBridgeLaptop
SandyBridgeLaptop

The Sandy-Bridge integrated graphics should be sufficient, but high-resolution 1920x1200 graphics and dual-display are desired. (I could live with 1920x1080).
There should also be a SATA hard disk for the OS (or SATA SSD without the 2.5in HDD form factor if space constrained) as the primary SSD array should be dedicated to the database.
Other desirable elements would be 1 or 2 e-SATA port, to support backup and restores with consuming the valuable main SSD array,
and 2x1GbE ports (so I can test code for parallel network transfers.

The multiple processor cores allow parallel execution plans. Due to a quirk of the SQL Server query optimizer, 8 or logical processors are more likely to generate a parallel execution plan in some cases.
Ideally, the main SSD array is comprised of 2 devices, one on each PCI-E x4 channel.

The point of the storage system is to demonstrate 2GB/sec+ bandwidth, and 100-200K IOPS. One of the sad fact is even today storage vendors promote $100K+ storage systems that end up delivering less than 400-700MB/s bandwidth and less than 10K IOPS. So it is important to demonstrate what a proper database storage system should be capable of.
Note that is it not necessary to have massive memory.  A system with sufficient memory and a powerful storage system can run any query, while a system with very large memory but weak storage can only run read queries that fit in memory. And even if data fits in memory, the performance could still fall off a cliff on tempdb IO.

Based on component costs, the laptop without PCI-E SSD should be less than $2000, and the SSD array should be less than $1000 per PCI-E x4 unit (4x64GB).
It would really help if the PCI-E SSD could be powered off from SW, i.e., without having to remove it. This why I want to boot off the SATA port, be  it HDD or SSD.

NAND notes
per below, 2 SSDs on SATA ports do not cut the mustard,
The spec above call for 8 SSDs. Each SSD is comprised of 8 NAND packages, and each package is comprised of 8 die. So there are 64 die in one SSD, and IO is distributed over 8 SSDs, or a total of 512 individual die.
The performance of a single NAND die is nothing special and even pathetic on writes. However, a single NAND die is really small and really cheap. That is why it is essential to employ high parallelism at the SSD unit level. And then, employ parallelism over multiple SSD units.
An alternative solution is for the laptop to expose 2-4 PCI-E x4 ports (2 Gen 2 or 4 Gen 1) to connect to something like the OCZ IBIS, or an SAS controller with 2 x4 external SAS ports.

System notes
The laptop will have 1 Intel quad-core Sandy-Bridge processor, which has 2 memory channels supporting 16GB dual-rank DDR3 memory. The processor has 16 PCI-E gen 2, DMI g2 (essentially 4 PCI-E g2 lanes) and integrated graphics. There must be a 6-series (or C20x) PCH, which connects upstream on the DMI. Downstream, there are 6 SATA ports (2 of which can be 6Gbps), 1 GbE port, and 8 PCI-E g2 lanes. So on the PCH, we can attach 2 HDD or SSD at 6Gbps, plus support 2 eSATA connections. There is only a single 1GbE port, so if we want 2, we have to employ a separate GbE chip.

While the total PCH down stream ports exceeds the upstream, it ok for our purposes to support 2 internal SATA SSDs at 6Gbps, 2 eSATA ports and 2 GbE, plus USB etc. The key is how the 16 PCI-E gen 2 lanes are employed. In the available high-end laptops, most vendors attach a high-end graphics chip (to all 16 lanes?). We absolutely need 8 PCI-E lanes for our high performance SDD storage array. I would be happy with the integrated graphics, but if the other 8 PCI-E lanes were attached to graphics, I could live with it.

The final comment (for now) is that even though it is possible to attach more than 2 SSD off the PCH, we need then bandwidth on the main set of PCI-E ports. It is insufficient for all storage to be clogging the DMI and PCH.

Thunderbolt
Thunderbolt is 2x2 PCI-E g2 lanes, so technically thats almost what I need (8 preferred, but 6 acceptable).
What is missing from the documentation is were Thunderbolt attaches.
If directly to the SandyBridge processor (with bridge chip for external?), then that's OK,
if off the PCH, then that is not good enough for the reasons I outlined above.

Also, we need serious SSDs to attach off TB, does the Apple SSD cut mustard?

The diagram below shows the Thunderbolt controller connected to the PCH, but also states that other configurations are possible. The problem is that most high-end laptops are designed with high-end graphics, which we do not want squandering all 16 PCI-E lanes.

thunderbolt

A Thunderbolt controller attached to the PCH is capable of supporting x4 PCI-E gen 2, but cannot also simultaneously support saturation volume traffic from internal storage (SATA ports), and network (not to mention eSATA). I should add that I intend to place the log on the SATA port HDD/SSD, along with the OS, hence I do not want the main SSD array generating traffic over the DMI-PCH connection.

A Thunderbolt SDK is supposed to released very soon, so we can find out more. I am inclined to think that Thunderbolt is really a docking station connector, being able to route both video and IO over a single connector. If we only need to route IO traffic, then there are already 2 very suitable protocols for this, i.e., eSATA for consumer, and SAS for servers, each with a decent base of products. Of course, I might like a 4 bay disk enclosure for 2.5in SSDs on 1x4 SAS, or an 8-bay split over 2 x4 ports. Most of the existing disk enclosures carry over from hard disk environment, with either 12-15 3.5in bays or 24-25 2.5in bays.

Published Friday, September 02, 2011 6:39 PM by jchang
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Benjamin Williams said:

If you could list what software you would load on this laptop outside of SSMS, I would be most grateful.  All of my searches and questions on LinkedIn groups have not been fruitful.

September 2, 2011 6:37 PM
 

nascar not going to whitehouse said:

But I thought an iMacNCheese can do everything!

But seriously folks:

"Due to a quirk of the SQL Server query optimizer"

which versions?

and of course....the incredibly bad performing mac kernel

for another thread....

September 2, 2011 7:56 PM
 

Thomas Kejser said:

hi Joe

You might want to have a look a the Dell Alienware series then. They come with the i7-2630QM and you can always rip out the SATAII drives in them and put some fast Intel MLC in there. Might not get you all the way to 2GB/sec, but depending on their bus speed, you can probably get pretty close. If you need to hit the 2GB mark, a thunderbolt.

Unfortunately, I have only been able to locate it with 8GB RAM.

September 3, 2011 3:32 AM
 

Bob Duffy said:

The Dell XPS L702 also has the i7QM processor, 8GB RAM and TWO disk slots that can take SSD. We got a bunch of them for 700 euro each in a special offer.

Be warned they are HUGE to carry though and dont have a VGA port. You need a little lead for the min display port.

September 3, 2011 6:40 AM
 

jchang said:

The Dell Alienware M17x supports 16GB and the 18x supports upto 32GB, of which 16GB is only +$400 over base, while 32GB is really expensive (4x8GB single rank). My issue with the Alienware is that I cannot nothing that specifically says it has 2 6Gbps SATA ports. The Intel 6-series chipsets has 6 SATA ports, 2 of which can be 6Gbps. Two 6Gbps SATA SSDs would get me to 1GB/s, but I am not sure on the IOPS because SSD vendors quote 4KB random IO, not 8KB.

(I just got a Dell PowerEdge T110 II, which is the server version of Sandy-Bridge. This does not have 6Gbps SATA ports, and furthermore, there is a BIOS incompatibility with the OCZ RevoDrive 3 X2)

So I think it is necessary to attach 4-8 SSDs off the PCI-E g2 port, not off the PCH SATA ports. To fit 4-8 SSDs (8 chips each, we have to discard the 2.5in HD form factor, and go with just the PCB)

September 3, 2011 8:52 AM
 

Adam Machanic said:

That's not a "quirk." It's very much by design.

September 3, 2011 9:56 AM
 

jchang said:

A quirk can be peculiar, or accidental,

The SQL Server query optimizer is designed around a model that assumes a weak IO system which saturates with 1 thread/core.

This might have been acceptable for the MSDE/Express builds, and possibly even Standard Edition, but Enterprise Edition should reflect an "enterprise" storage system, which does not saturate with 1 thread. Of course, storage vendors and some users persist in selling expensive storage systems with really cr@ppy performance, so perhaps the SQL Server query optimizer needs to consider the actual storage system, and not just go by edition.

September 4, 2011 9:27 AM
 

John Paul Cook said:

I have a Mac running Windows 7. I haven't tried putting SQL Server on it. It is the best laptop hardware I've ever had. I ordered it with a matte 1920x1200 display.

Lynn Langit put SQL Express on her Mac: http://blogs.msdn.com/b/socaldevgal/archive/2011/01/18/a-macbook-air-for-a-microsoft-evangelist.aspx

September 4, 2011 9:29 AM
 

Adam Machanic said:

Joe, no argument from me about the optimizer being quirky and based on assumptions that have been totally out of date for five or more years. But the number of CPUs piece of the costing equation does make some sense in my opinion. The optimizer does need to consider, somehow, what the impact will be of parallelism depending on the DOP.

The reason that 8 or more CPUs tends to generate parallel plans more often is that there are two sides of parallel coin: stream distribution and stream gathering. BOTH have a cost, and if you don't distribute to enough threads to split up the work, gathering the streams back together will be expensive enough that the entire parallel operation will be at a loss. This is one of the reasons that parallelism tended to perform so badly back in SQL Server 2000; most servers weren't properly equipped to deal with parallel plans.

By the way, if you want to fake it on your laptop, just ask SQL Server to create more schedulers using the appropriate startup option. No big deal.

September 4, 2011 6:38 PM
 

jchang said:

I discussed parallel execution plans, and the relation to IO costing before, but in summary, an execution plan has IO anc CPU cost components. Depending on the execution plan, it is common for the IO component to be a large portion of the overall cost.

In a parallel plan, the CPU component is reduced by the degree of parallelism (with adjustment). There is no reduction in the IO oomponent (except for sort and hash). And as you mention, the parallelism operations. Usually the repartition streams is most signficant.

The main point, because there is no reduction in IO (plan) cost, the reduction in CPU components may not be sufficient to offset the additional parallelism operations (unless there are very many "cores").

In a sevver system with "enterprise"-class storage (which can be HDD or SSD), a single thread does not saturate the IO, so a parallel plan should have reduced IO cost (time in seconds)

Hence the SQL Server query optimizeer is inhibiting parallel execution plans when it should not.

the -P startup option is nice but I would settle for 4 physical cores and 8 logical. I do think there is some interest in having a Knight's something processor with very many simple.cores, but my main system would be the one I described

September 4, 2011 7:01 PM
 

jchang said:

the macbook is nice, I believe it is the thinnest 17in, and if they are sticking with 16:10, then good for them, and dunce caps for the PC vendors who have gone entirely 16:9.

But like I said above, 2 SSD's on SATA ports, especially if its only 3Gbps SATA, does not cut mustard in database IO.

For real database IO performance, I called for SSDs matched to a PCI-E gen2 x8. A current generation 6Gbps SATA SSD (at min capacity of 64GB) is comprised of 8 NAND packages (8 die per package), which is a reasonable match the 6Gbps SATA port.

The performance of a single NAND die is nothing special, and even pathetic on write performance. However, a NAND die is really small and really cheap. That is why it is essential to employ high parallelism at the SSD unit level. And then, employ parallelism over multiple SSD units. (For some reason, people with desktop backgrounds cannot grasp the concept of IO over multiple channels, always going for a single fast storage device)

To match with a current generation quad-core, I called for 8 of the 8 chip/package SSDs. Consider the OCZ RevoDrive 3 X2, which unfortunately has a BIOS incompatibility with the Dell PE T110 II. This has 4 SATA SSD controllers on a PCI-E gen 2 interface.

I call for 8, it is possible 6 current SATA SSDs on the PCI-E g2 x8 is sufficient, plus an additional SSD/HDD on the SATA port.

September 4, 2011 9:51 PM
 

John Paul Cook said:

How about using the Thunderbolt port on the Macbook Pro to connect fast external storage?

September 5, 2011 2:34 PM
 

Ashton said:

Joe, have you looked at the Lenovo W520?  Has 4 memory slots (and I think supports 8GB chips for total of 32GB), has SATA 6.0Gbps support, can remove the optical drive and add 2nd HDD or SDD (I added an OCZ Vertex in the primary and a large HDD in the 2nd bay).

It also has Quad Core with Hyperthreading.  

September 6, 2011 11:16 AM
 

jchang said:

first, let me thank everyone for commenting, bringing up Thunderbolt etc. Now I will reiterate: A single SSD or even 2, attached to the SATA ports (even 6Gpbs ports) does not cut the mustard (for the database and tempdb data files)

For that, I want 6-8 SATA SSDs or equivalent (each 64GB SSD comprised of 8 NAND packages of 8 die each, or 64 NAND die per 64GB SSD). Note that a 64GB SSD can have the same sequential bandwidth as larger 126-256GB SSDs, but the larger SSDs (with 16 or 32 x 8 die) have better random read IOPS. Hence the requirement is an 8 x 64GB array of SSDs, or minimally 6 x 64GB SSD. And this should be distributed over multiple PCI-E lanes, one per (8 NAND package) SSD.

I did say that I still wanted the normal 2 SATA ports for either SSD or HDD for OS boot plus the database log file.

But the database and tempdb data files will go on the big 6-8 x 64GB array.

September 6, 2011 3:52 PM
 

jchang said:

I like to keep my systems clean, only the important software that actually get used. For the more important systems used to generate published data, software is kept to a minimum. I also lookup for signs of poorly written software, ie, memory leak, excessive/unusual CPU, interrupts and IO activity.

Check task manager to see how many processes are active, a minimum set might be low 40's, I have seen people with 70+

But then I must have the MLB high-def,

for a while I was really annoyed by the constant Flash updates, but that seems to have subsided.

September 8, 2011 11:23 AM
 

Mvaneyl said:

Hi, you know.. I like the sony 3d, but then I saw this: http://www.amazon.com/ASUS-G74SX-3DE-17-3-Inch-Gaming-Laptop/dp/B00542SWQ8/ref=wl_it_dp_o?ie=UTF8&coliid=I3QHDUKZLQ7Q01&colid=2W0AKMNNGYVVY, how do you campare it? it seems that the asus configuration plus 3d is better than the sony vaio f3d... or the HP ENVY 3D appear with better configuration, im confused..

September 9, 2011 12:45 PM
 

jchang said:

it sounds like you are more interested in 3D/gaming performance, in which this is the wrong post to bring up that subject.

The key point of a performance database laptop is to not squander the valuable PCI-E g2 lanes on a non-requirement like graphics

September 9, 2011 10:26 PM
 

Paul Randal said:

We got pretty close with our Dell M6500 Covets - 1900x1200 NVidia (with 32-bit tru-color screens - for photography), 4-core i7-920XM, 16GB, 2x256GB Samsung PM800 MLC drives and space for on-board blu-ray. Heavy, crappy battery life, but it's a great server in a laptop for VMs and heavy perf demos.

September 10, 2011 9:31 AM
 

John Paul Cook said:

Dell no longer sells 1900x1200 displays. That's what they told me before I bought the MacBook Pro.

September 11, 2011 4:06 PM
 

jchang said:

It is really annoying how vague Dell (and other vendors too) can be on key system specs. When the Dell Precision M4600/6600 came, I could not tell if it supported 6Gbps SATA on 2 ports, even though the Intel chipset specifically says so. Apparently now Dell supports 6Gbps SATA SSDs on the Mx600 (denoted by SATA3 ,even though this is not correct SATA-IO.org terminology).

So far, my own testing on a "entry" server with 1 quad-core Sandy Bridge, 16GB memory shows that 2 OCZ Vertex 3 120GB (6Gpbs interface) delivers very good DW performance. Some queries show 15K IOPS at 64K per IO, for 960MB/s, or 480MB/s per SSD. Neither the previous generation Vertex 2 or the Crucial C300 could deliver as good performance. Tomorrow I should have the Crucial M4 (same as Dell uses in the Mx600 line?).

It does appear that more than 2 current generation SATA SSDs are required for best IO performance.

I know OCZ has the Vertex 3 Max IOPS series, which will deliver better small block random IO performance, but the TPC-H queries generate 64K IO, so I do not think there will be much difference between the Vertex 3 regular and Max IOPS. more later.

I do strongly prefer 1900x1200 display, not sure if I am ready for a Mac, if I could get 4 SSDs on the Thunderbolt + 2 internal SSDs, then I may be ready.

September 12, 2011 12:32 AM
 

Aaron said:

have you considered NOT using a laptop, but using a small portable desktop?

September 15, 2011 9:19 AM
 

Paulo Condeça said:

I have implemented a different approach:)

To sum up, Server with intel Corei7 CPU + 12GB RAM + UPS + Plenty of storage :)

Server has lots of VM's attached to it, it is configured to WakeOnLan through the internet.

In the end, you can remote connect to it with any device that as an internet connection (ex: laptop/netbook/iphone/ipad etc...)

Regards,

Paulo Condeça.

September 15, 2011 9:41 AM
 

jchang said:

regarding a small portable desktop - the main issues is that the small form factor desktops typically do not has have PCI-E g2 x4/x8 slots, and as I said regarding the minimum config, 4 SSDs equivalent to 6Gpbs SATA + 1 SATA port for either HDD or SSD. The best option is to get an entry server, which would have the PCI-E g2 x8 slots. Next, the smallest form factor I could find to fit what I need is a mini-tower. Its too bad I could not find a pizza box chassis with 2 PCI-E slots. I did consider the Dell R210 II, 1U, 1 PCI-E g2 x16 slot and 4 internal 2.5in bays. The internal SATA ports are only 3Gbps, per Intel C202 PCH (idiots!) What I am not sure is if I get the optional H200 SAS/SATA controller, does this consume the PCI-E x16 slot, or does it attach to the PCI-E lanes on the C202? If the later, then I could have 4 SSDs 6Gbps SATA interface, plus one PCI-E SSD, which could potentially equal 4 SATA SSDs.

But I opted for the T110 II in mini-tower because it is really quite because of the large diameter fans, and it is $500 less than similar R210 II.

Regarding VM, real IO performance is not going to happen on VM, most of the IO performance material put out by the VM player is rubbish.

September 18, 2011 1:07 PM
 

Glenn Berry said:

Unfortunately, the laptop market seems to be in a race to the bottom, as far as features and prices go, driven by tablets. Even the high-end workstation laptops, such as the Dell Precision M4600 don't have nearly the feature set you are looking for, as you already know.

I don't think there would be much of a market for a laptop system like you are looking for, because of the high cost, and high weight.

I think you may be able to put together or buy a nice Sandy Bridge-EP (Xeon E5-1600 series) tower system with PCI-E 3.0, QPI 1.1, etc. that could be used to put together a killer DW system for not very much money.

September 24, 2011 6:15 PM
 

Thomas Kejser said:

Joe, did you see the new OCZ drives? These are bootable, so you can forget about that crappy old SATA:

http://hothardware.com/Reviews/OCZ-ZDrive-R4-Performance-Preview/

Of course, you need an x8 lane, Full length, half height, for them. I am not sure if there are any pizza box formats out there which will do it?

September 28, 2011 9:54 AM
 

Joe said:

Right now I have the OCZ RevoDrive 3 X2, which seems to have a comptability issue with the Dell T110 II BIOS/UEFI. The orginal RevoDrive worked with the Dell T710, so I suspect this is an issue with the new Dell Sandy-Bridge UEFI, meaning the Z-Drive R4 would probably have the same issue. As soon as this is resolved, I will take another look. The Dell T110 II has 2 PCI-E g2 x8 slots, so 2 RevoDrive 3 X2 (240GB at $700 each) is probably the best solution. The pizza box R210 II has 1 x16 slot, so the higher performance but more expensive Z4 is probably the right solution.

September 28, 2011 10:42 AM
 

Joe said:

The non-mobile options are: 1) desktop Small FF, Ultra SFF, Server mini-tower and server 1U rack. Below are dimensions and weights from Dell. The USFF dimensions and weight are great, but it has no PCI-E slots. I am thinking the desktop SFF is best because it would fit in a large backpack. The 1U server has low volume, but slightly unwieldy dimensions for transport

Small Form Factor (SFF)

Dimensions Inches/(cm) 11.4x3.7x12.3 / (29.0x9.3x31.2)

Min. Weight (lbs/kg) 12.57 /5.70

Ultra Small Form Factor (USFF)

Dimensions Inches/(cm) 9.3x2.6x9.4 / (23.7x6.5x24.0)

Min. Weight (lbs/kg) 7.20 / 3.27

PowerEdge R210 II

Dimensions Inches/(cm) 17.1x1.66x15.5 / (4.2x43.4x39.4)

Weight (lbs/kg) 17.76 / 8.05

PowerEdge T110 II

Dimensions Inches/(cm) 18.25x7.45x16.55 / (46.3x 18.9x42.0)

Weight (lbs) 15.50

September 29, 2011 11:40 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement