Lies, damned lies, and statistics!
If you have read my three previous posts (1, 2, 3), you may walk away with an impression that on a drive presented from a high-end enterprise class disk array, Windows file fragmentation does not have a significant performance impact. And I’ve given you empirical data—oh yeah, statistics—to support that impression.
But that is not the whole story! No, I didn’t lie to you. The numbers I presented were solid. It’s just that the story is not yet finished.
In these previous posts on the performance impact of file fragmentation, I presented the I/O throughput data as the evidence. The arguments were valid, especially we did see file fragmentation causing the I/O throughput to degrade in a directly attached storage. But I/O throughput is but one I/O performance metric, and it is not enough to look at the I/O throughput alone.
Let me start with an analogy. So suppose we have an eight-lane super highway going from New York City to Los Angles. As we pumping (okay, driving) more cars from NYC to LA, we take measure at a checkpoint in LA to find out how many cars are crossing that checkpoint every hour, i.e. we are measuring the throughput of the super highway. Now, instead of building the eight-lane super highway straight from NYC to LA, we have it take a detour via Boston. At that same checkpoint in LA, we again measure the throughput. Everything else being equal, we should get the same throughput.
However, for a given car, the trip from NYC to LA would take a lot longer on this detoured highway.
An I/O path is similar to a super highway. While its throughput is an important measure, how long it takes for an I/O request to complete—I/O latency or response time—is also an important measure. The question is, will file fragmentation take your I/O traffic for a detour?
Indeed, empirical test data show that when a file is severely fragmented, the maximum I/O latency of large sequential reads and writes (e.g. 256KB reads and writes) can suffer significantly. The following chart shows the impact of file fragmentation on the maximum I/O latency. The data were obtained from the same tests whose throughputs were reported in Part III of this series of posts.
Clearly, when the test file was fragmented into numerous 128KB disconnected pieces, some of the 256KB reads suffered terribly latency degradation. And if your applications happen to be issuing these I/Os, you would most likely experience a performance degradation regardless whether the I/O path can maintain the same I/O throughput.
Having some valid statistics may seem to add force to an argument, which makes it so much easier to be misleading if the whole story is not told, and technically everything is valid, and nobody is lying. By the way, this is a trick often employed by the vendors, who tend to conveniently ignore the bad news, or intentionally bury it with statistics on the good news. In my book, that would be lies, damned lies, and statistics.