THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

New Fusion ioDrive2 and ioDrive2 Duo

Fusion-iO just announced the new ioDrive2 and ioDrive2 Duo on Oct 2011 (at some conference of no importance). The MLC models will be available late November and the SLC models afterwards. See the Fusion-iO press release for more info.

Below are the Fusion-IO ioDrive2 and ioDrive2 Duo specifications. The general idea seems to be for the ioDrive2 to match the realizable bandwidth of a PCI-E gen2 x4 slot (1.6GB/s) and for the ioDrive2 Duo to match the bandwidth of a PCI-E gen2 x8 slot (3.2GB/s). I assume that there is a good explanation why most models have specifications slightly below the corresponding PCI-E limits.

The exception is that 365GB model at about 50% of the PCI-E g2 x4 limit. Suppose that the 785GB model implement parallelism with 16 channels and 4 die per channel. Rather than building the 365GB model with the same 16 channels, but a different NAND package with 2 die each, they just implemented 8 channels using the same 4 die per package. Lets see if Fusion explains this detail.

Fusion-IO ioDrive2

ioDrive2 Capacity400GB600GB365GB785GB1.2TB
NAND Type SLC (Single Level Cell) MLC (Multi Level Cell)
Read Bandwidth (64kB) 1.4 GB/s 1.5 GB/s 710 MB/s 1.2 GB/s 1.3 GB/s
Write Bandwidth (64kB) 1.3 GB/s 1.3 GB/s 560 MB/s 1.0 GB/s 1.2 GB/s
Read IOPS (512 Byte) 351,000 352,000 84,000 87,000 92,000
Write IOPS (512 Byte) 511,000 514,000 502,000 509,000 512,000
Read Access Latency 47 µs 47 µs 68 µs 68 µs 68 µs
Write Access Latency 15 µs 15 µs 15 µs 15 µs 15 µs
Bus Interface PCI-E Gen 2 x4
Price $? ? $5,950? $? ?

Fusion-IO ioDrive2 Duo

ioDrive2 Capacity1.2TB2.4TB
NAND Type SLC (Single Level Cell) MLC (Multi Level Cell)
Read Bandwidth (64kB) 3.0 GB/s 2.6 GB/s
Write Bandwidth (64kB) 2.6 GB/s 2.4 GB/s
Read IOPS (512 Byte) 702,000 179,000
Write IOPS (512 Byte) 937,000 922,000
Read Access Latency 47 µs 68 µs
Write Access Latency 15 µs 15 µs
Bus Interface PCI-E Gen 2 x8
Price $? ?

Between the SLC and MLC models, the SLC models have much better 512-byte reads IOPS than the MLC models, with only moderately better bandwidth and read latency. Not mentioned, but common knowledge is that SLC NAND has much greater write-cycle endure than MLC NAND.

It is my opinion that most database, transaction processing and DW, can accommodate MLC NAND characteristics and limitations in return for the lower cost per TB. I would consider budgeting a replacement set of SSDs if analysis shows that the MLC life-cycle does not match the expected system life-cycle. Of course, I am also an advocate of replacing the main production database server on a 2-3 year cycle instead of the traditional (bean-counter) 5-year practice.

The difference in read IOPS at 512B is probably not important. If the ioDrive2 MLC models can drive 70K+ read IOPS at 8KB, then it does not matter what the 512B IOPS is.

One point from the press release: "new intelligent self-healing feature called Adaptive FlashBack provides complete chip level fault tolerance, which enables ioMemory to repair itself after a single chip or a multi chip failure without interrupting business continuity." For DW systems, I would like to completely do away with RAID when using SSDs, instead having two system without RAID on SSD units. By this, I mean fault-tolerance should be pushed into the SSD at the unit level. Depending the failure rate of the controller, perhaps there could be two controllers on each SSD unit.

For a critical transaction processing system, it would be nice if Fusion could provide failure statistics for units that have been in production for more than 30 days (or whatever the infant mortality period is) on the assumption that most environments will spend a certain amount of time to spin up a new production system. If the failure rate for a system with 2-10 SSDs is less than 1 per year, then perhaps even a transaction processing system using mirroring for high-availability can also do without RAID on the SSD?

ioDrive2 and ioDrive2 Duo
I do think that it is great idea for Fusion to offer both the ioDrive2 and ioDrive2 Duo product lines matched to PCI-E gen2 x4 and x8 bandwidths respectively. The reason is that server systems typically have a mix of PCI-E x4 and x8 slots with no clear explanation of the reasoning for the exact mix, other than perhaps that being demanded by the customer complaining the loudest.

By have both the ioDrive2 and Duo, it is possible to fully utilize the bandwidth from all available slots balanced correctly. It would have been an even better idea if the Duo is actually a daughter card the plugs onto the ioDrive2 base unit, so the base model can be converted to a Duo, but Fusion apparently neglected to solicit my advice on this matter.

I am also inclined to think that there should also be an ioDrive2 Duo MLC model at 1.2TB, on the assumption that the performance will be similar to the 2.4TB model, as the ioDrive2 765GB and 1.2TB models have similar performance specifications. The reason is that a database server should be configuration with serious brute force IO capability, that is, all open PCI-E gen 2 slots should be populated. But not every system will need the x8 slots populated with the 2.4TB MLC model, hence the viability of a 1.2TB model as well.

if Fusion should be interested in precise quantitative analysis for SQL Server performance, instead of the rubish whitepapers put out by typical system vendors, well I can turn a good performance report very quickly. Of course I would need to keep the cards a while for continuing analysis...

Published Tuesday, October 04, 2011 7:06 PM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Adam Machanic said:

"at some conference of no importance."

Love it :-)

October 4, 2011 7:16 PM

Oliver Aaltonen said:

For what it's worth, we've deployed ~100 of the first-generation ioDrive and ioDrive Duos for our customers' database servers (running RHEL, but the hardware's the same), and we have yet to see a single failure in well over a year of production use.

I can recall only two instances where the ioDrives acted funky, and both were on systems with known PCIe issues at the hardware level (i.e. bad riser, motherboard, and/or firmware). The ioDrives disappeared after a reboot, and never re-appeared. I've never seen them fail on a system that's up and running.

October 5, 2011 11:58 AM

jchang said:

Adam: well perhaps of some interest as they do have that iron man impersonator

Oliver: yes this help alot. There are 365.2425 x 24 = 8765.82 hours per year. For 1M-hr MTBF, we should expect 9 failures per 1000 units each year. Please let us know if this is still the case next year too!

The reason we have RAID is that in the old days, HDD MTBF was more like 100K-hr, so a storage system with 1000 drives would encounter 100 failures per year, ie, not good. If we do have solid evidence for Fusion at 1M-hr MTBF (actual production, not elevated temperature testing) this means a system with 10 ioDrives has a 1 out of 10 expectation of failure in one year of operation, which should be sufficient to support the no-RAID proposal.

Furthermore, I am guessing that the most likely mode of SSD failure is inability to write new data, but old data is still readable (near). Wouldn't it be a neat trick if SQL Server could detect that one SSD has write-failed, and just write changes to other working units? and then migrate data off the failed SSD

October 5, 2011 12:26 PM

Adam Machanic said:

BTW, in my current project we have two servers, each with 7 Fusion-IO cards. In one of the servers a couple of months ago we had 3 drives totally fail in a two week period. Not sure if it was a bad batch, or what. I'm still a big fan of the cards.

October 5, 2011 12:55 PM

jchang said:

I have heard of multiple disks failing simultaneously, which would lead me to believe that was due to environmental factors - under-voltage, failed fanned etc. The MTBF stuff I talked about should apply to statistical failure rate. For 3 units in a 2 week period, I am inclined to suggest a bad component, a capacitor perhaps? Still thats a good point. RAID is only a good solution for statistical failure events, such that 2 or 3 disks failing nearly simultaneously (a second unit fails before the spare is put in and the RAID rebuilt) does not happen. For systemic failure events, I think it is better to rely on a failover system with its own storage (ie, not a cluster). Given that there are two complete systems for failover, there should be no need to go overboard on single HA.

October 5, 2011 4:34 PM

Dave Wujcik said:

This is a cost-effective, fast, reliable solution to accelerating high IO loads such as database/VDI, vs having to use hundreds of spinning disks to try and keep up with the same performance. You can raid all the SSDs you want, you'll never get sub 50us latencies out of it, or any reliability. The low latency is really what makes the product.

-- Dave

October 6, 2011 12:39 PM

jchang said:

Most RDMBS engines, even the one from Ironman, have from the beginning been properly optimized for very large HDD arrays, so that properly written code is not hindered in performance. Of course, crappy code always performs like crap.

See my summary of TPC-E benchmark results, which shows that SSD may improve performance in the range of 2-10% depending on memory (a larger impact for memory constrained, less for 1TB memory). The theory is that with SSDs, disk latency is lower, and there are fewer transactions in flight at any given point in time, which improves efficiency.

The advantage of SSDs is that unconstrained performance can now be achieved a much lower cost point. An unconstrained HDD system might have 500 disks. For 15K 146GB SAS direct-attach, figure $500 per disk amortizing the disk enclosure, so $250K, with a raw capacity of 73TB, or RAID 10 capacity of 36TB.

Now compare consumer and enterprise SSD at $4K and $15K per TB respectively. If we really need 36TB net capacity, then the HDD solution is good. But if we only 1-10TB, then the SSD solution is better.

October 6, 2011 3:13 PM

Robert (Bob) Leithiser said:

An item of major significance is that devices at this level of write latency introduce a new paradigm for data mining. Whereas the model to date for data mining has been aggregating large amounts of data and then looking for patterns, the design pattern is transformed by the amazing write latencies of these devices. Now, instead of generating aggregates and analytical derivative values in batch, the fast write latencies allow the immediate storage of the calculated data.

I've written some complex triggers that leverage the low latency of PCIE-SSD to do expensive calculations including rolling 20 day Bollinger Band and moving averages as quote data is received. These are plenty fast enough to keep up with 20 Mbps Internet pipes used to collect the raw data. Such a technique can support contrarian financial trading bots that anticipate market reversals before they start by integrating real-time quote data with longer-term trends and correlations. This type of real-time monitoring is potentially more profitable and less risky than simply riding trends as do most of the current trading bots.

I believe that a key to proactive BI which includes decision-making with simulative validation rather than just decision-analysis is in frameworks that continually aggregate derivatives associated with event sequences. If you maintain sufficient analytics in real-time relative to the data collection then the relational state changes between significant events can be monitored constantly to allow actionable BI that adapts decision criteria dynamically without the need for post-mortem analysis of correlative data that is prone to be out of date by the time it is analyzed.

Something to think about…

October 30, 2011 12:04 AM

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement