Recently sp_blitz procedure on one of my OLTP servers returned alarming notification about high latency on one of the disks (more than 100ms per IO). Our chief storage guy didn’t understand what I was talking about – according to his measures, average latency is only about 15ms. In order to investigate the issue, I’ve recorded 2 snapshots of sys.dm_io_virtual_file_stats and calculated latency per read and write separately. Results appeared to be even more alarming: while for read average latency was only 9ms, for write it skyrocketed to 260ms. Even more strange – PerfMon results for the same time interval showed similar read latency but only 17ms per write! We’ve almost opened support request for it – during informal phone conversations with colleagues in Microsoft nobody could explain it.
Finally we’ve found the answer after more thorough analysis of PerfMon file. It appears that calculation formula is different. sys.dm_io_virtual_file_stats contains total number of writes and total write latency (stall), so by dividing them we receive average latency per single write operation. PerfMon on another hand measure Disk sec/write counter every sample interval (default is 15 seconds, I used 1 second interval in order to see more detailed picture). And according to PerfMon, average latency per write during certain period is average of samples measured during this period. Disk we’re talking about contains only datafiles of a single database. So we’ve noticed that while number of read operations per second is pretty equal along entire interval, writes behave totally different. Most of the time we see 0 (zero) writes – and 0 write latency accordingly. But once every minute we see massive writes – SQL tries to write almost 4000 IOs per second for several seconds in a row (several means sometimes 2, sometimes 10). I guess, it is checkpoint. And during these peaks latency jumps to huge values – up to 1.8 seconds (!) per write. The catch is that when we have 59 samples with 0 latency (we don’t have writes, remember?) and 1 second with 120ms latency, PerfMon with calculate average and receive (59*0 + 1*120)/60 = 2ms per write latency. sys.dm_io_virtual_file_stats at the same time will report 120 ms / write. And indeed, when I calculated average write latency from PerfMon but took into account only samples with non-zero number of writes, results became very close to what sys.dm_io_virtual_file_stats showed.
So who is right in this story and should we do something with the disk? Both are right – they just calculate different numbers. sys.dm_io_virtual_file_stats calculates average latency per IO but doesn’t provide any indication whether load is linear or contains peaks and idle periods. Peaks have much more influence – because peaks in latency are usually accompanied by peaks in number of IO operations, so their relative weight is higher. Perfmon on another hand, when it comes to calculate average over period (for example, PAL tool does it automatically), smoothens peaks because for it every sample – the one with zero writes and the one with 10K writes per second – has same relative weight.
Here is how our latency graph looks:
My first thought was: “Checkpoint is background process, shouldn’t really affect user experience – storage guy is right, leave the disk alone”. But then I decided to check read latencies – indeed, when it is so busy that write takes 1.8 seconds, there is no reason to suppose that reads fill any better. As I said in the beginning, overall average read latency was 9ms. But when separated for write-idle and checkpoint periods, results were completely different. For write-idle periods, read latency was 1.5ms. But during checkpoint it jumped to 120ms. So user experience is affected – read queries that go to disk can run very slow or even appear as stuck during checkpoints. Beware .