<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx</link><description>Recently sp_blitz procedure on one of my OLTP servers returned alarming notification about high latency on one of the disks (more than 100ms per IO). Our chief storage guy didn’t understand what I was talking about – according to his measures, average</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46423</link><pubDate>Wed, 28 Nov 2012 08:42:31 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46423</guid><dc:creator>Greg Linwood</dc:creator><description>&lt;p&gt;Interesting article. The main reason for using io_virtual_file_stats vs perfmon is to try to obtain information at the database file level vs the whole disk or volume level. &lt;/p&gt;
&lt;p&gt;Ideally we would be able to capture perfmon style calculations (transient rather than aggregated) but at database file level rather than at disk / volume level.&lt;/p&gt;
&lt;p&gt;Even better would be to be able to capture this information at the index level but that might be getting a little greedy for a paying customer&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46429</link><pubDate>Wed, 28 Nov 2012 15:00:51 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46429</guid><dc:creator>Michael Zilberstein</dc:creator><description>&lt;p&gt;Greg, there is more than one perspective here. As a consultant, I need some &amp;quot;quick and dirty&amp;quot; diagnostics tools that can pinpoint problems without spending much time on capturing and analyzing traces. Procedures like sp_blitz are ideal here. Of course if I find bottleneck on the disk level, I'll dig in file level and find which files to reallocate to other disks.&lt;/p&gt;
&lt;p&gt;BTW, in my case next step isn't reallocating files but checking storage. If storage is configured properly, write latency should be less than read one. Reads go to disk while writes should go to storage cache. I guess, here we have storage configured to write-through or low queue depth configuration value.&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46434</link><pubDate>Wed, 28 Nov 2012 22:19:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46434</guid><dc:creator>Greg Gonzalez</dc:creator><description>&lt;p&gt;Hi Michael,&lt;/p&gt;
&lt;p&gt;I wanted to point out that you can easily convert the sys.dm_io_virtual_file_stats data to the same numbers as PerfMon using this formula:&lt;/p&gt;
&lt;p&gt;(x2 – x1) / (y2 – y1) / 1000&lt;/p&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;p&gt;x1 = io_stall_ms start value&lt;/p&gt;
&lt;p&gt;x2 = io_stall_ms end value&lt;/p&gt;
&lt;p&gt;y1 = num_ios start value&lt;/p&gt;
&lt;p&gt;y2 = num_ios end value&lt;/p&gt;
&lt;p&gt;Perflib is effectively performing this calculation every time Perfmon queries it, based on the sample interval, but using latency ms and number of IOs at the disk level. If you were to sample virtual_file_stats at the exact same frequency and perform this calculation, you’d see very close to the same numbers. You’d obviously need to take disks and database file locations into account when doing the comparison.&lt;/p&gt;
&lt;p&gt;I hope this is helpful!&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46436</link><pubDate>Wed, 28 Nov 2012 22:40:07 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46436</guid><dc:creator>Michael Zilberstein</dc:creator><description>&lt;p&gt;Hi Greg,&lt;/p&gt;
&lt;p&gt;You're of course right, that's pure math. But 2 points:&lt;/p&gt;
&lt;p&gt;1. Your counter isn't &amp;quot;Disc sec\write&amp;quot; but &amp;quot;Disc sec\write\sec&amp;quot;.&lt;/p&gt;
&lt;p&gt;2. Thinking straightforward, you don't expect &amp;quot;disc sec\write&amp;quot; counter to be influenced by frequency of sampling. While in this case sampling once every 2 seconds instead of every second will multiply the result by factor of 2.&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46438</link><pubDate>Thu, 29 Nov 2012 00:44:14 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46438</guid><dc:creator>Greg Gonzalez</dc:creator><description>&lt;p&gt;#1: Not sure I follow exactly, but I'm converting avg ms per IO to avg seconds per IO to keep in sync with the Perflib counter which uses seconds, not ms.&lt;/p&gt;
&lt;p&gt;#2: It's not influenced by sample frequency. If you take the delta of stalls ms and IOs over any period and divide, you'll get the avg ms per IO over that period. The sample frequency is irrelevant. Your assertion that PerfMon does a different calculation which weights samples disproportionately is not accurate. It calcs deltas between samples and divides, and the formula works the same way over 1 second or 60. As you said, it's pure math ;-)&lt;/p&gt;
&lt;p&gt;Here is a shot of the SQL Sentry dashboard showing total Windows disk and total SQL Server file latency, collected from Perflib and file_stats respectively:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://www.sqlsentry.net/images/temp/windowssqllatency.png"&gt;http://www.sqlsentry.net/images/temp/windowssqllatency.png&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Notice the patterns are almost identical. We convert the raw disk latency counter to ms, and the file_stats latency is calc'd using the formula above, both over a 20 second sample by default. This is a dedicated SQL system with multiple active disks and databases.&lt;/p&gt;
&lt;p&gt;If you're going to compare the data from Windows and SQL, you must use the exact same frequency, and the samples must be taken simultaneously (or as close as you can get). If you don't, you can and will see timing related discrepancies. That may explain what you saw originally, not sure.&lt;/p&gt;
&lt;p&gt;If you're interested we cover this issue and more fun stuff in the patent app for our disk analysis module:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;amp;Sect2=HITOFF&amp;amp;d=PG01&amp;amp;p=1&amp;amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;s1=%2220100214298%22.PGNR.&amp;amp;OS=DN/20100214298&amp;amp;RS=DN/20100214298"&gt;http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;amp;Sect2=HITOFF&amp;amp;d=PG01&amp;amp;p=1&amp;amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;s1=%2220100214298%22.PGNR.&amp;amp;OS=DN/20100214298&amp;amp;RS=DN/20100214298&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cheers&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46439</link><pubDate>Thu, 29 Nov 2012 01:16:37 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46439</guid><dc:creator>Michael Zilberstein</dc:creator><description>&lt;p&gt;#1. You're right, just converted milliseconds to seconds.&lt;/p&gt;
&lt;p&gt;#2. I see you point. But that's not the way you usually work with PerfMon. You don't take 2 snapshots (and as far as I understand you have to write code in order to do it - there is no total cumulative counter for disk latency like in virtual_file_stats) but rather record counter values every 10 or 15 seconds (every single second in my case) and then start to analyze. So when you come to analyze interval, you have no other way but to calculate some sort of average between counters. That's what PAL tool does, for example. And that's the only way you can compare it to virtual_file_stats from SQL because contrary to PerfMon, it contains only cumulative numbers. &lt;/p&gt;
&lt;p&gt;Actually, there is way to receive same results from PerfMon. Number of writes should be captured along with latency. And _weighted_ average of latency metric values should be calculated instead of simple average. Weighted - by number of write operations. So when num of disk writes is zero, relative latency weight in the formula will also be zero at that sample.&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46440</link><pubDate>Thu, 29 Nov 2012 01:21:10 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46440</guid><dc:creator>Michael Zilberstein</dc:creator><description>&lt;p&gt;Patent application is really funny read - seems like same English language but so much different :-).&lt;/p&gt;
&lt;p&gt;Greg, do you have historical analysis module in your tool or it contains only real-time monitoring? If you have and user chooses some custom time period - how do you calculate latency during that period? Or you just present raw data (metrics) without calculating single latency number?&lt;/p&gt;
</description></item><item><title>re: Beware of SQL Server and PerfMon differences in disk latency calculation</title><link>http://www2.sqlblog.com/blogs/michael_zilberstein/archive/2012/11/28/46419.aspx#46454</link><pubDate>Fri, 30 Nov 2012 00:17:27 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46454</guid><dc:creator>Greg Gonzalez</dc:creator><description>&lt;p&gt;That's right, you can easily &amp;quot;back in&amp;quot; to the total latency per sample by multiplying the avg sec per IO times the number of IOs for the same interval. These samples can then be added up and divided by the total IOs for the range to get the same type of number you'd get from file_stats.&lt;/p&gt;
&lt;p&gt;If you look under the covers at the raw Perflib data retrieved by Perfmon (or our app), the Avg Disk sec/Read|Write counter returns both the running total IOs and the running total latency in ms (BaseValue and RawValue properties respectively)... this is exactly the same way file_stats stores it, as running totals. PerfMon (or the app) then calls a function on the samples using the formula above to come up with the numbers you see.&lt;/p&gt;
&lt;p&gt;I just wanted to ensure it was clear that there is no inherent problem with either data source... it all comes down to how you are processing and using the data.&lt;/p&gt;
&lt;p&gt;Yes, it's amazing how the lawyers can take simple concepts and make them seem complex ;-)&lt;/p&gt;
&lt;p&gt;Yes, SQL Sentry is all about historical data and analysis -- you can look at these and other metrics over any date range. Ranges &amp;lt;=30 minutes show raw samples (20 sec for latency), and &amp;gt;30 min ranges show averages over varying &amp;quot;resolution intervals&amp;quot; depending on the range size.&lt;/p&gt;
</description></item></channel></rss>