THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kevin Kline

Final Revelations from the Recent TPC-H Benchmark

I hope you’ve enjoyed my multipart write up on HP and Microsoft’s most recent TPC-H Benchmark disclosure report.  I’m a huge proponent of benchmarking and of reading the vendor disclosure reports.  So I hope I was able to convey the highlights from the report as well as give you a few reasons as to why it’s important to review their findings.  I’ve also greatly enjoyed your feedback and learned from it.  Keep it up!

The only other major concept used in the construction of the benchmark environment worth noting at this point, imo, is the astronomically high number of files per filegroup and the use of multiple filegroups per database.  Here’s an excerpt of the CREATE DATABASE script:

CREATE DATABASE tpch3000g ON PRIMARY

(NAME = tpch3000g_root,

FILENAME = "G:\TPCH_Roots\tpch3000g.mdf",

SIZE = 10MB,

FILEGROWTH = 10MB),

 

FILEGROUP STAGING_FG

(NAME = tpch3000g_stage1, FILENAME = 'G:\mnt\ntfs\1\load\stage.mdf', SIZE = 110000MB),

..

(NAME = tpch3000g_stage32, FILENAME = 'G:\mnt\ntfs\32\load\stage.mdf', SIZE = 110000MB),

 

FILEGROUP LINEITEM_FG

(NAME = tpch3000g_li1, FILENAME = 'G:\mnt\li\1\', SIZE = 22500MB),

(NAME = tpch3000g_li192, FILENAME = 'G:\mnt\li\192\', SIZE = 22500MB),

 

FILEGROUP GENERAL_FG

(NAME = tpch3000g_gen1, FILENAME = 'G:\mnt\gen\1\', SIZE = 7800MB),

(NAME = tpch3000g_gen192, FILENAME = 'G:\mnt\gen\192\', SIZE = 7800MB)

 

LOG ON

(NAME = tpch3000g_log1, FILENAME = "L:\\", SIZE = 297000MB)

 

 

As you can see, each filegroup had as many as 192 files of hundreds of GB in size.  I’ve heard a number of IO tuning recommendations related to the use and configuration of files and filegroups, the most common advice being to configure N-1 filegroups per database (where N is the number of CPUs on the server).  Since the benchmarked server had 32 dual-core Itanium 2 9050 processors, I’m not sure how the specific use of 192 files and 4 major filegroups reconciled with this advice.  I’d appreciate any insight you might have.

There are a lot of other interesting but very minor tidbits to glean from reviewing the disclosure report.  For example, the benchmarking team substituted their own user-defined function called “big_count” in place of the system-supplied function COUNT.  Perhaps this new UDF was better at counting very large numbers of records?

Thanks again for your comments.  I hope this was of some help to you.

Cheers,

-Kevin

Published Monday, June 25, 2007 11:48 AM by KKline
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Hi Kevin,

Just one comment: The COUNT_BIG function mentioned in the document is not custom--it has shipped with SQL Server since SQL Server 2000.  It's the same as COUNT, but outputs a BIGINT rather than an INT.  They probably needed the extra precision for the test.

June 25, 2007 1:09 PM
 

mrdenny said:

I would assume that the additional data files in the LINEITEM_FG and GENERAL_FG file groups would be there to simply spread the load across additional mount points to increase the number of spindles behind each file group so that the final checkpoint at the end of the process would have the maxamium bandwidth available to it.

Based on the naming structor of the folders it appears that Microsft setup each file on it's own array and mounted those arrays as mount points on a single drive letter.

June 25, 2007 1:19 PM
 

Denis Gobo said:

If I remember correctly you could use COUNT_BIG(*) in indexed views but not COUNT(*)

June 25, 2007 1:22 PM
 

Adam Machanic said:

mrdenny:

Seems about right to me.  I think that the usual advice for N files per DB falls in line with the fact that we usually don't have access to 192 arrays (at least, I haven't seen that in a production environment as of the time of this writing!)

June 25, 2007 1:55 PM
 

Chuck Boyce said:

<<

Since the benchmarked server had 32 dual-core Itanium 2 9050 processors, I’m not sure how the specific use of 192 files and 4 major filegroups reconciled with this advice.  I’d appreciate any insight you might have.

>>

It just makes me curious as to why they did this.

The recommendation is actually<A HREF="http://www.microsoft.com/technet/prodtechnol/sql/2005/physdbstor.mspx">1 db file per core</A> per filegroup, so I'm even more puzzled?

FILEGROUP STAGING_FG is affected as half the number of cores and FILEGROUP LINEITEM_FG has three times more filegroup files than the number of cores.

Chuck

June 26, 2007 7:11 AM
 

WesleyB said:

Shea posted an excellent test about the number of filegroups http://sqlblog.com/blogs/linchi_shea/archive/2007/01/29/how-many-data-files-should-i-create-for-a-user-database.aspx.

Many whitepapers mention this best practice but I would like to know the technical reason behind this which is not really mentioned as far as I remember.

June 27, 2007 7:36 AM
 

Adam Machanic said:

WesleyB:

From what I understand there are two reasons for this:

A) SQL Server can use more threads and better parallelize reads against multiple files than against a single file. This is especially true if they're on separate arrays.

B) There are benefits with regard to page allocations.  If you're doing a lot of small allocations in a single file you may encounter blocking.  More files means less blocking in this scenario.

June 27, 2007 9:20 AM
 

Denis Gobo said:

When I added 7 files (8 total) for tempdb (on a 8 processor box) I saw a big improvement for big queries

Read this

Lining up the number of data files with CPU’s has scalability advantages for allocation intensive workloads.

• It is recommended to have .25 to 1 data files (per filegroup) for each CPU on the host server.

• This is especially true for TEMPDB where the recommendation is 1 data file per CPU.

• Dual core counts as 2 CPUs; logical procs (hyperthreading) do not.

more here: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx

June 27, 2007 9:53 AM
 

KKline said:

This is excellent feedback guys!  Thanks so much for this great information...

-Kev

July 13, 2007 1:14 PM

Leave a Comment

(required) 
(required) 
Submit

About KKline

Kevin Kline is a well-known database industry expert, author, and speaker. Kevin is a long-time Microsoft MVP and was one of the founders of PASS, www.sqlpass.org.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement