|
|
|
|
Aaron is a senior consultant for SQL Sentry, Inc., makers of performance monitoring and event management software for SQL Server, Analysis Services, and Windows. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at user group meetings and SQL Saturday events.
Background
Several weeks ago, I ran some tests on the new implementation of Unicode Compression in SQL Server 2008 R2 - both space-related and performance-related. Of course it was pointed out that my performance tests were somewhat flawed, because I was using data from a very wide and atypical variety of characters sets. I also didn't like that I was comparing too many different things in the same pass. So, I thought I would run one more battery of tests, and right a couple of wrongs.
So this time I decided to do more narrow testing : simply SQL 2008 RTM vs. SQL 2008 R2 (August CTP, build 10.50.1092), both on x64.
The Test
The tables are otherwise identical, as I populated them with the exact same 500,000 rows. So, update and read performance will be based on identical data (except for the compression parts, of course). Instead of comparing page to row to none, and then using different type of indexes for each test, I created two tables in each database: one with compression = page (which includes row), and one with no compression. All four tables have one clustered index and one non-clustered index:
USE CompressionTesting; GO
CREATE TABLE dbo.Customers_Compressed ( CustomerID INT PRIMARY KEY, FirstName NVARCHAR(64), LastName NVARCHAR(64), Address1 NVARCHAR(255), Address2 NVARCHAR(255), City NVARCHAR(64), [Region] NVARCHAR(64), PostalCode NVARCHAR(10), Country NVARCHAR(64), Phone VARCHAR(16), Email VARCHAR(320), [Password] VARBINARY(16), DateCreated SMALLDATETIME, DateModified SMALLDATETIME ) WITH (DATA_COMPRESSION = PAGE);
|
I ran the above script in both instances, and then again for a table called dbo.Customers_Uncompressed. As you can guess, the only difference is that the table called dbo.Customers_Uncompressed does not have the option WITH (DATA_COMPRESSION = PAGE).
Instead of spending hours perfecting a script that would simulate random data, I decided to use a tool off the shelf. The one I was most familiar with was Red-Gate's SQL Data Generator, though there are several other ways to avoid re-inventing the wheel, including built-in support in Visual Studio Team System 2008 Database Edition (read more on GertD's blog).
I made a copy of one of the tables above (structure only) to prime a single source table on each instance. This way I could populate all four real tables that I wanted to measure, with everything else being the same. So on each instance I created this table in tempdb:
USE [tempdb]; GO
CREATE TABLE dbo.PrimeData ( CustomerID INT IDENTITY(1,1) PRIMARY KEY, FirstName NVARCHAR(64), LastName NVARCHAR(64), Address1 NVARCHAR(255), Address2 NVARCHAR(255), City NVARCHAR(64), [Region] NVARCHAR(64), PostalCode NVARCHAR(10), Country NVARCHAR(64), Phone VARCHAR(16), Email VARCHAR(320), [Password] VARBINARY(16), DateCreated SMALLDATETIME, DateModified SMALLDATETIME );
|
Then I pointed SQL Data Generator at this table on my R2 instance. I made some adjustments to the defaults; first, I changed the target rows from 1,000 to 500,000. And instead of using the wide range of characters I used before, I adjusted sample data to include mainly Finnish last names, from this page. (I simply created a new NamesLast.txt file, and pointed the tool at that file instead of the default file.) Since most last names didn't have any Unicode characters at all, I threw in a few more names from various character sets in this WikiPedia entry. While not as widely random as my previous experiment, I still think
there is enough of a variance here to represent a typical
internationalized application. I did a similar thing with the first
names, but left addresses and all of the other columns alone. Once I was happy with the way the sample data would be generated, I let 'er rip. It populated the 500,000 rows in about 20 seconds. To get the exact same data over to the RTM instance, I did not run the generator tool again, as that would create different data. Instead, I ran the following script on the RTM instance to prime the local table with the exact same data:
USE [tempdb]; GO
SET IDENTITY_INSERT dbo.PrimeData ON;
INSERT dbo.PrimeData ( [CustomerID], [FirstName], [LastName], [Address1], [Address2], [City], [Region], [PostalCode], [Country], [Phone], [Email], [Password], [DateCreated], [DateModified] ) SELECT * FROM [R2_LinkedServer].tempdb.dbo.PrimeData;
SET IDENTITY_INSERT dbo.PrimeData OFF;
|
Now I was ready to populate the main tables. But before I started, I wanted to set up a server-side trace on each instance, so I could measure various aspects of the operations. This trace setup looks alarmingly like one you might have seen before in my previous Unicode Compression testing. The only thing I've dropped is filtering on login; I am not running anything else on this server, so I didn't need to worry about that. So I start this trace on both the 2008 RTM and the 2008 R2 instance:
USE [master]; GO
DECLARE @rc INT, @TraceID INT, @TraceName SYSNAME = N'C:\temp\CT_' + CONVERT(SYSNAME, SERVERPROPERTY('InstanceName')), @MaxFS BIGINT = 5;
EXEC @rc = sp_trace_create @TraceID OUTPUT, 0, @TraceName, @MaxFS, NULL;
EXEC sp_trace_setevent @TraceID, 12, 15, 1; EXEC sp_trace_setevent @TraceID, 12, 16, 1; EXEC sp_trace_setevent @TraceID, 12, 1, 1; EXEC sp_trace_setevent @TraceID, 12, 17, 1; EXEC sp_trace_setevent @TraceID, 12, 14, 1; EXEC sp_trace_setevent @TraceID, 12, 18, 1; EXEC sp_trace_setevent @TraceID, 12, 11, 1; EXEC sp_trace_setevent @TraceID, 12, 35, 1; EXEC sp_trace_setevent @TraceID, 12, 12, 1; EXEC sp_trace_setevent @TraceID, 12, 13, 1;
EXEC sp_trace_setfilter @TraceID, 35, 0, 6, N'CompressionTesting';
EXEC sp_trace_setstatus @TraceID, 1;
SELECT TraceID = @TraceID, TraceFile = @TraceName + N'.trc';
|
Now, for the tests themselves. I am measuring writing to and reading from each copy of the table. It starts with an initial population of the 500,000 rows, then some selects, then some updates. Finally, I dump the index stats from sys.dm_db_partition_stats into a table. When all four versions of the script are done, I turn off the trace on both systems. Here is a sample script:
USE CompressionTesting; GO
-- R2, compressed INSERT dbo.Customers_Compressed ( [CustomerID], [FirstName], [LastName], [Address1], [Address2], [City], [Region], [PostalCode], [Country], [Phone], [Email], [Password], [DateCreated], [DateModified] ) SELECT * FROM tempdb.dbo.PrimeData;
-- R2, compressed SELECT TOP 1 * FROM dbo.Customers_Compressed ORDER BY NEWID(); GO 10
-- R2, compressed SELECT TOP 1 email FROM dbo.Customers_Compressed ORDER BY LastName, FirstName; GO 10
-- R2, compressed SELECT TOP 1 * FROM dbo.Customers_Compressed WHERE FirstName LIKE 'a%'; GO 10
-- R2, compressed SELECT TOP 1 * FROM dbo.Customers_Compressed WHERE LastName LIKE '%a'; GO 10
-- R2, compressed UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'a', 'b') WHERE LastName LIKE '%a%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'c', 'b') WHERE LastName LIKE '%c%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'd', 'b') WHERE LastName LIKE '%d%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'e', 'b') WHERE LastName LIKE '%e%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'f', 'b') WHERE LastName LIKE '%f%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'g', 'b') WHERE LastName LIKE '%g%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'h', 'b') WHERE LastName LIKE '%h%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'i', 'b') WHERE LastName LIKE '%i%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'j', 'b') WHERE LastName LIKE '%j%'; UPDATE dbo.Customers_Compressed SET LastName = REPLACE(LastName, 'k', 'b') WHERE LastName LIKE '%k%';
-- R2, compressed UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'a','c') WHERE FirstName LIKE '%a%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'b','c') WHERE FirstName LIKE '%b%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'd','c') WHERE FirstName LIKE '%d%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'e','c') WHERE FirstName LIKE '%e%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'f','c') WHERE FirstName LIKE '%f%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'g','c') WHERE FirstName LIKE '%g%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'h','c') WHERE FirstName LIKE '%h%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'i','c') WHERE FirstName LIKE '%i%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'j','c') WHERE FirstName LIKE '%j%'; UPDATE dbo.Customers_Compressed SET FirstName = REPLACE(FirstName, 'k','c') WHERE FirstName LIKE '%k%';
-- R2, compressed, drop INSERT dbo.indexstats SELECT 'R2, compressed', index_id, in_row_data_page_count, used_page_count, reserved_page_count, row_count FROM sys.dm_db_partition_stats WHERE [object_id] = OBJECT_ID('dbo.Customers_Compressed');
|
Of course the difference being that on the RTM instance the comments say RTM (in case I get my trace data results confused), and the scripts that run against the uncompressed version of the table are slightly different.
The Results
Given the results of my previous tests, the outcome was a little surprising to me this time. First we'll deal with space savings on the clustered index. It is no surprise at all that using database compression in R2 yielded significant space savings. In this case, the uncompressed version of the table weighed in at 122 MB, while the compressed version came in at around 81 MB (~33% savings). On RTM, however, compression didn't buy us much at all: the table went from 122 MB to 121 MB. (The nonclustered index was static with a constant space utilization of about 56 MB.) Here is the overall layout for all four scenarios:
Space used, in kilobytes
And then for performance, I graphed out total CPU usage, total duration, and total I/O:
CPU time, in milliseconds
 Elapsed time, in seconds
Number of 8KB read or write operations
Conclusion
In my tests, Unicode Compression in 2008 R2 provides great I/O and space gains at a very marginal CPU and duration overhead (IMHO). Duration itself wasn't so bad for R2 compressed vs. RTM compressed, especially given that RTM compressed didn't buy us any measurable space savings. Again, whether or not data compression is right for you (in 2008 or 2008 R2) will depend on a lot of factors, including the characteristics of your current hardware, workloads, data distribution and access patterns. I just hope I'm adding some insight into how you can test this for yourself in your own environment (it is not my intention to say, "See, you *should* use compression!").
Comment Notification
If you would like to receive an email when updates are made to this post, please register here
Subscribe to this post's comments using
About AaronBertrand
...about me...
|
|
|
|
|