THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Build a Big Table with a Columnstore Index

I was looking all over to find a big table to use for my columnstore examples in my SQL Server 2012 book, and the only one I found was from someone at Microsoft and was not publicly available. When I demonstrate code in my writings, I like to have the code available for anyone to use, so that was no use.

Finally I realized I was just going to have to do it myself. I actually based the script on some work by the awesome Benjamin Nevarez (blog| twitter), but I needed to make a couple of changes to the table structure, and then the loop for adding the data took some tweeking to make sure that uniqueness was possible, in case you need to test out how the metadata changes when a columnstore index is built on a table with a unique clustered index compared to a nonunique clustered index compared to a heap.

I have just finished the chapter on indexes, and decided to make this script available. The initial table data is based on the data in Microsoft AdventureWorksDW2012 sample database that you can download here. (Note that the DW version is the first one under “Other Available Downloads”’; it’s not the one under “Recommended Download”. )

Here is the section of the script that populates most of the FactInternetSalesBIG table. (The original 60K rows were just copied from the original FactInternetSales table. Note that all the statements have to be run in a single batch because the local variable’s scope is the batch, and because the GO 9 applies to the single batch that precedes it. Of course, you can change the number of iterations to end up with a different size table. Mine ends up at just over 30 million rows. In addition to modifying the SalesOrderNumber value on each iteration, I also changed the value of the Revision column to indicate which pass through the insert loop was being executed.



-- Copy the new big table into itself 9 times
DECLARE @RevisionNumber nchar(2);
SELECT @RevisionNumber = RevisionNumber + 1 FROM RevisionNumberValue;
SELECT @RevisionNumber as RevisionNumber;
INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)
SELECT ProductKey
,OrderDateKey
,DueDateKey
,ShipDateKey
,CustomerKey
,PromotionKey
,CurrencyKey
,SalesTerritoryKey
,SalesOrderNumber + @RevisionNumber
,SalesOrderLineNumber
,@RevisionNumber
,OrderQuantity
,UnitPrice
,ExtendedAmount
,UnitPriceDiscountPct
,DiscountAmount
,ProductStandardCost
,TotalProductCost
,SalesAmount
,TaxAmt
,Freight
,CarrierTrackingNumber
,CustomerPONumber
,OrderDate
,DueDate
,ShipDate
FROM dbo.FactInternetSalesBig;
UPDATE RevisionNumberValue SET RevisionNumber = RevisionNumber + 1;
GO 9

Make sure you have enough log space. My log grew from 1GB to just over 8GB when running the script to build the 30 million row table and columnstore index, with no clustered index.

The downloadable script has commented options to build either a clustered or nonclustered index before you build the columnstore index. I suggest building the clustered index before the columnstore index, because building the clustered index will rebuild any existing nonclustered indexes, including my columnstore index. Building the columnstore index can take a while, on my system it was about 10 minutes. (Building the original 30+ million row table took even longer.)

After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!

Have fun!

~Kalen

Published Thursday, April 26, 2012 12:18 PM by Kalen Delaney

Attachment(s): Create Big Table and CS Index with unique data.txt

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Mike Tarlton said:

Thank you for making it available. I Think I'll use it and modify a little in accordance with my necessaty. I will comment agian with my results.

April 27, 2012 5:33 PM
 

Andrei said:

Hi Kalen:

thank you very much for your post! I did everything you suggested and than ran this query (over LAN, with Columnstore Index in place, SQL Server 2012 is running on remote machine with quadcore, 16GB RAM)

SELECT [SalesOrderNumber], sum([SalesAmount])

 FROM [AdventureWorksDW2012].[dbo].[FactInternetSalesBig]

 group by SalesOrderNumber

It succesfully returns 14161408 rows in 3 minutes.

When I ran the same Query vs. my Local copy of AdventureWorksDW2012 database (but without Columnstore Index and with no LAN involved (everything LOCAL on my Workstation, including SQL Server 2012) it took 6 minutes. Configuration of my Workstation is quadcore, 16GB RAM.

Andrei,

http://apandre.wordpress.com/

May 9, 2012 5:05 PM
 

Kalen Delaney said:

Thanks for the data point, Andrei!

May 9, 2012 7:06 PM
 

Andrei said:

Hi Kalen:

just in order to compare apples vs. apples, I rerun the same query locally on server I mentioned above (so no LAN involved). It returns those 14161408 rows in less then 100 seconds, almost twice faster then over LAN.

Andrei,

http://apandre.wordpress.com/

May 10, 2012 2:58 PM
 

Paul Brewer said:

Hi Kalen,

We managed to get Star optimizations appearing in our BI query execution plans at work after much database redesign. Attempted to compare the  performance of these against column store indexes was a nightmare though. With no decent documentation to go on, it's impossible trying to reverse engineer the query optimizer logical as to and when, how and why it makes best use of column store indexes.

I did try, see http://paulbrewer.dynalias.com/SitePages/CS_Index.aspx

Really looking forward to seeing your book when it comes out. I've read all your previous ones and they are the backbone of my skillset. I just pasted the Microsoft Certified Master of SQL Server 2008 knowledge exam thanks in no small part to your books.

Regards

Paul      

May 23, 2012 11:45 AM
 

Darryll Petrancuri said:

Kalen: I have a question for you that unfortunately is likely to have an 'it depends' answer. That being said, I'd like your first blush, off the cuff answer nonetheless.

I am working in an environment where, for better or worse, the BI queries done against our DW are entirely ad-hoc (ROLAP via Microstrategy). We have used ColumnStore Indexes in a partitioned Fact table, partitioned at the Day grain and this is workinq very nicely for us.

The question I have is regarding our Dimension tables. Because the queries are entirely ad-hoc, every single one of the attributes in our dimensions are key / value pairs, and every single one of the keys (integers) are indexed in separate indexes. Upserts take a long time because of the sheer number of attributes / indexes. Read-Only considerations due to the use of ColumnStore Indexes not withstanding, do you think we'd see better performance if we basically truncated a dimension, did straight Inserts and then added a ColumnStore for all the key columns instead of relying on more 'traditional' approaches to dimensional maintenance / ETL?

I look forward to your learned feedback.

Respectfully,

Darryll

July 12, 2012 10:19 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement