THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Denis Gobo

A Quick Look At Compression In SQL 2008

To use compression you have to understand what your data looks like, if your data is mostly unique then compression might not really help a lot. If your data is repeated on a page then it could be beneficial. In my own test compression would improve performance with some tables and not so much with other tables. I won't explain how compression works, this is all explained in books on line. the goal of this post is for you to have some code that you can run and then see how page and row level compression differs compared to the original data.


Here is some code that you can run to see how compression works. the data I have used is similar to what we use at my job(but not the same, i used fake tickers so don't try to invest in those  ;-0). I have used just the 10 industry groups from the Industry Classification Benchmark, I left out supersectors,sectors and subsectors otherwise we would have a lot more rows per table. I also did not create fake ISIN Code,SEDOL and CUSIP columns since that would have bloated the code in this post. The code will generate 2780650 rows per table, this is a nice size to test with.
Compression is really nice when you get charged $30 per GB for the SAN, so besides performance there is also a cost benefit in storage


Here is the code
This block of code will create the database, setup the lookup tables and create the tables for our tests

 

use master

go

create database CompressionTest2

go

use CompressionTest2

go

--our Industry Classification Benchmark table

Create table IndustryGroup(GroupCode char(4) not null primary key,

GroupDescription varchar(40))

GO

--just industry groups

insert IndustryGroup values('0001','Oil & Gas' )

insert IndustryGroup values('1000','Basic Materials' )

insert IndustryGroup values('2000','Industrials')

insert IndustryGroup values('3000','Consumer Goods')

insert IndustryGroup values('4000','Healthcare')

insert IndustryGroup values('5000','Consumer Services')

insert IndustryGroup values('6000','Telecommunications')

insert IndustryGroup values('7000','Utilities')

insert IndustryGroup values('8000','Financials')

insert IndustryGroup values('9000','Technology')

GO

--currency table

Create table Currency (CurrencyCode char(3) not null primary key, CurrencyDescription varchar(30))

GO

--just handful of currencies

insert currency values('USD','U.S. Dollar')

insert currency values('AUD','Australian Dollar')

insert currency values('CAD','Canadian Dollar')

insert currency values('JPY','Japanese Yen')

insert currency values('MXN','Mexican Peso')

insert currency values('GBP','U.K. Sterling')

insert currency values('EUR','European Euro')

insert currency values('ISK','Iceland Krona')

insert currency values('BGN','Bulgarian Lev')

insert currency values('RON','Romanian Leu')

insert currency values('INR','Indian Rupee')

insert currency values('RUB','Russia Rubles')

insert currency values('BHD','Bahrain Dinar')

insert currency values('EGP','Egypt Pounds')

insert currency values('JOD','Jodan Dinars')

insert currency values('KWD','Kuwait Dinars')

insert currency values('MAD','Morocco Dirham')

insert currency values('OMR','Omam Rial')

insert currency values('QAR','Qatari Rial')

GO

--market cap table

create table MarketCap (MarketCap varchar(20) not null primary key,MarketCapDescription varchar(100) )

GO

--left out a bunch of market caps

insert MarketCap values('ALL','broad')

insert MarketCap values('MID','mid cap')

insert MarketCap values('MCR','micro cap')

insert MarketCap values('SML','small cap')

insert MarketCap values('LRG','large cap')

 

--calendar table

create table Calendar (CalendarDate date not null primary key)

GO

insert Calendar

select dateadd(d,number,'19920101') from master..spt_values

where type = 'p'

and datepart(dw,dateadd(d,number,'20080101')) not in (1,7)

union

select dateadd(d,number,'19970810') from master..spt_values

where type = 'p'

and datepart(dw,dateadd(d,number,'19970810')) not in (1,7)

 

 

 

 

--the table that we will test against

create table IndexCloseValues (CalendarDate date not null,

IndexSymbol varchar(30) not null,

GroupCode char(4) not null,

CurrencyCode char(3) not null,

MarketCap varchar(20) not null,

CloseValue decimal(30,10),

TotalReturnClose decimal(30,10))

 

--2780650 rows

insert IndexCloseValues

select CalendarDate,

CurrencyCode + left(GroupCode,1) + MarketCap as Symbol, --fake tickers

GroupCode,CurrencyCode,MarketCap,

ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.00100,

ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.247001

from IndustryGroup i cross join currency c

cross join MarketCap

cross join calendar

GO

 

--create a copy for page compression

select * into IndexCloseValuesPage

from IndexCloseValues

Go

--create a copy for row compression

select * into IndexCloseValuesRow

from IndexCloseValues

GO

 

 

--add unique constraint

ALTER TABLE IndexCloseValues WITH NOCHECK

ADD CONSTRAINT UQ_IndexCloseValues UNIQUE Clustered (CalendarDate,IndexSymbol)

GO

--add unique constraint

ALTER TABLE IndexCloseValuesPage WITH NOCHECK

ADD CONSTRAINT UQ_IndexCloseValuesPage UNIQUE Clustered (CalendarDate,IndexSymbol)

GO

--page compression

ALTER TABLE IndexCloseValuesPage

REBUILD WITH (DATA_COMPRESSION = PAGE);

GO

--add unique constraint

ALTER TABLE IndexCloseValuesRow WITH NOCHECK

ADD CONSTRAINT UQ_IndexCloseValuesRow UNIQUE Clustered (CalendarDate,IndexSymbol)

GO

--row compression

ALTER TABLE IndexCloseValuesRow

REBUILD WITH (DATA_COMPRESSION = ROW);

GO

Now that everything is setup we can look how big the tables are. Run the code below (old school I know)

dbcc showcontig('IndexCloseValues')

dbcc showcontig('IndexCloseValuesPage')

dbcc showcontig('IndexCloseValuesRow')


Table: 'IndexCloseValues' (213575799); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 23767
- Extents Scanned..............................: 2972
- Extent Switches..............................: 2971
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.97% [2971:2972]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.17%
- Avg. Bytes Free per Page.....................: 23.3
- Avg. Page Density (full).....................: 99.71%

Table: 'IndexCloseValuesPage' (245575913); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 9307
- Extents Scanned..............................: 1165
- Extent Switches..............................: 1164
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.91% [1164:1165]
- Logical Scan Fragmentation ..................: 0.04%
- Extent Scan Fragmentation ...................: 0.77%
- Avg. Bytes Free per Page.....................: 10.7
- Avg. Page Density (full).....................: 99.87%

Table: 'IndexCloseValuesRow' (261575970); index ID: 1, database ID: 16
TABLE level scan performed.
- Pages Scanned................................: 13255
- Extents Scanned..............................: 1657
- Extent Switches..............................: 1656
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1657:1657]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 0.36%
- Avg. Bytes Free per Page.....................: 11.4
- Avg. Page Density (full).....................: 99.86%

As you can see compression really reduced the pages for the table, page level compression looks really good here.


First let's look at the execution plans, running the following block of code....

select * from IndexCloseValues

where CalendarDate = '19920101'

 

select * from IndexCloseValuesPage

where CalendarDate = '19920101'

 

select * from IndexCloseValuesRow

where CalendarDate = '19920101'

 

...will result in this plan

Execution1day

Increasing the query to return a month of data instead of a day

 

select * from IndexCloseValues

where CalendarDate between '19920101' and '19920121'

 

select * from IndexCloseValuesPage

where CalendarDate between '19920101' and '19920121'

 

select * from IndexCloseValuesRow

where CalendarDate between '19920101' and '19920121'

will result in this plan.


Execution1month 

 

You can draw your own conclusions from those images

 

Let's look at some reads, first turn statistics io on

SET STATISTICS IO ON

go


 

select * from IndexCloseValues

where CalendarDate = '19920101'

 

select * from IndexCloseValuesPage

where CalendarDate = '19920101'

 

select * from IndexCloseValuesRow

where CalendarDate = '19920101'


950 row(s) affected)

Table 'IndexCloseValues'. Scan count 1, logical reads 12, physical reads 0......

(950 row(s) affected)

Table 'IndexCloseValuesPage'. Scan count 1, logical reads 7, physical reads 0......

(950 row(s) affected)

Table 'IndexCloseValuesRow'. Scan count 1, logical reads 8, physical reads 0......

 

Those are some nice numbers for the reads, now we will increase the date range to one month
 

select * from IndexCloseValues

where CalendarDate between '19920101' and '19920121'

 

select * from IndexCloseValuesPage

where CalendarDate between '19920101' and '19920121'

 

select * from IndexCloseValuesRow

where CalendarDate between '19920101' and '19920121'


(14250 row(s) affected)

Table 'IndexCloseValues'. Scan count 1, logical reads 125, physical reads 0......

(14250 row(s) affected)

Table 'IndexCloseValuesPage'. Scan count 1, logical reads 52, physical reads 0......

(14250 row(s) affected)

Table 'IndexCloseValuesRow'. Scan count 1, logical reads 69, physical reads 0......

 

When selecting more data the numbers look even better.

Turn statistics io off again

SET STATISTICS IO OFF

go


So as you can see compression reduces the reads by over half when using page compression.

Published Tuesday, April 07, 2009 4:32 PM by Denis Gobo

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Barfieldmv said:

I would like to know the performance of the select querys that were run in the last test. How long did the actual querys take and was there any slow down by using compression?

May 6, 2009 5:33 AM
 

Denis Gobo said:

Barfieldmv,

here are the times

Uncompressed 229 ms.

Page compression 205 ms.

Row Compression 196 ms.

May 11, 2009 3:53 PM
 

Dhiraj said:

how can i find out the data compression affect on CPU cost?

June 22, 2010 5:47 AM
 

Aamir Hasan said:

If we have max of indexes in our table then is it will help full but nice example,

www.aspxtutorial.com

August 29, 2011 1:55 PM
 

coleus forskohlii for glaucoma said:

Denis Gobo : A Quick Look At Compression In SQL 2008

October 27, 2014 9:30 PM

Leave a Comment

(required) 
(required) 
Submit

About Denis Gobo

I was born in Croatia in 1970, when I was one I moved to Amsterdam (and yes Ajax is THE team in Holland) and finally in 1993 I came to the US. I have lived in New York City for a bunch of years and currently live in Princeton, New Jersey with my wife and 3 kids. I work for Dow Jones as a Database architect in the indexes department, one drawback: since our data goes back all the way to May 1896 I cannot use smalldates ;-( I have been working with SQL server since version 6.5 and compared to all the other bloggers here I am a n00b. Some of you might know me from http://sqlservercode.blogspot.com/ or even from some of the newsgroups where I go by the name Denis the SQL Menace If you are a Tek-Tips user then you might know me by the name SQLDenis, I am one of the guys answering SQL Questions in the SQL Programming forum.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement