THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Piotr Rodak

if datepart(dw, getdate()) in (6, 7)
    use pubs;



Partitioned views

I must say I am a big fan of views. They give flexibility of changing the logic of the data retrieval without affecting DAL layers in your application. I posted series of articles about updateable views and triggers on views, their application in SSIS for example.

Recently we finished development of system that loads data from several external sources, makes some calculations on the data and sends data over FTP to downstream systems. I designed the whole data interface to use views and stored procedures. This way, even though we used ORM to perform some of the activities, it was not accessing bare tables. This is good, because you can implement simpler and more flexible security model. The other advantage is that you if you use some simple techniques, you can load data to table and then expose the new data through a view without exposing the logic to the upper layers of your application.

It is quite common requirement to be able to maintain multiple versions of data within table. The requirement is also that data import should not affect other processes which read current version of data. With views it is really simple to achieve. There are many ways of doing this, but let’s have a look at partitioned views technique.

Partitioned view is a view that selects data from more than one table. These tables have to follow specific requirements. Amongst the other, there has to be so called partitioning column defined, which has trusted CHECK constraint defined on it. The tables have to be joined by UNION ALL operator. If you want the view to be updateable, the partitioning column has to be part of the primary key of all tables belonging to the view. And you have to select all columns from the table.

OK, but what is the purpose of creating partitioned views? The short answer is performance. Tables tend to grow bigger and bigger. It becomes more and more expensive to run index updates, backups, and queries against them. If you are able to partition your data and you do it in a right way, you can alleviate some of the issues related to big tables. Classic example would be partitioning by date – you can keep month worth of data in each table and span the view over 12 tables to cover the whole year. Nice thing about partitioned views is that you can index underlying tables in a different way, so for example history tables have more indexes allowing for faster data retrieval, while current month table can be optimized for faster inserts and updates.

Let’s have a look at this scenario. I will use as an example table Sales.CurrencyRate from AdventureWorks database. This table contains 13532 rows in my database, but let’s imagine we have 135 million rows.

When I run this query, I get 37 rows spanning years 2001 - 2004:

   1: select count(*) NumberOfRates, Year([CurrencyRateDate]) RateYear, Month([CurrencyRateDate]) RateMonth 
   2: from Sales.CurrencyRate    
   3: group by Year([CurrencyRateDate]), Month([CurrencyRateDate])   
   4: order by RateYear, RateMonth

We will create four tables, each for each year that is represented by CurrencyRateDate column.

A slightly abbreviated DDL for the Sales.CurrencyRate table is as follows:

   1: CREATE TABLE [Sales].[CurrencyRate](
   2:     [CurrencyRateID] [int] IDENTITY(1,1) NOT NULL,
   3:     [CurrencyRateDate] [datetime] NOT NULL,
   4:     [FromCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
   5:     [ToCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
   6:     [AverageRate] [money] NOT NULL,
   7:     [EndOfDayRate] [money] NOT NULL,
   8:     [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_CurrencyRate_ModifiedDate]  DEFAULT (getdate()),
   9:  CONSTRAINT [PK_CurrencyRate_CurrencyRateID] PRIMARY KEY CLUSTERED 
  10: (
  11:     [CurrencyRateID] ASC
  12: )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
  13: ) ON [PRIMARY]

If we want to have updateable partitioned view, we need to modify the constraint to include date part, which will allow the execution engine to identify partition that has to be modified during DML operation. The easiest is to add the CurrencyRateDate column to the primary key.

Let’s create four tables then, one for each year of the data. I will not place here whole DDL for all tables, you can find it in the attached file. This is however the first partition table:

   1: CREATE TABLE [Sales].[CurrencyRate_2001](
   2:     [CurrencyRateID] [int] IDENTITY(1,1) NOT NULL,
   3:     [CurrencyRateDate] [datetime] NOT NULL,
   4:     [FromCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
   5:     [ToCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
   6:     [AverageRate] [money] NOT NULL,
   7:     [EndOfDayRate] [money] NOT NULL,
   8:     [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_CurrencyRate_2001_ModifiedDate]  DEFAULT (getdate()),
   9:     constraint CHK_CurrencyRateDate_2001 CHECK (CurrencyRateDate>= '20010101' and CurrencyRateDate < '20020101'),
  10:  CONSTRAINT [PK_CurrencyRate_2001_CurrencyRate_2001ID] PRIMARY KEY CLUSTERED 
  11: (
  12:     [CurrencyRateID] ASC,
  13:     [CurrencyRateDate] ASC
  14: )
  15: ) ON [PRIMARY]

As you see above, the CurrencyRateDate column has been included into the primary key of the table and I added constraint CHK_CurrencyRateDate_2001 that will take care of the range of dates that can exist in the table. You can use inequality constraint if you’d like to store rows for dates from 2001 or older. Tables for 2002, 2003 and 2004 contain constraints disallowing inserting data outside of the ranges, similar to the the constraint shown above.

I removed identity column from partition tables to make the view updateable – this is one of requirements.

The view is defined as follows:

   1: create view Sales.vCurrencyRate
   2: as
   3: select * from Sales.CurrencyRate_2001
   4: union all 
   5: select * from Sales.CurrencyRate_2002
   6: union all
   7: select * from Sales.CurrencyRate_2003
   8: union all
   9: select * from Sales.CurrencyRate_2004

 

Let’s insert some data to the view – that is, the underlying partition tables:

   1: insert Sales.vCurrencyRate
   2: (
   3:     [CurrencyRateID],
   4:     [CurrencyRateDate],
   5:     [FromCurrencyCode],
   6:     [ToCurrencyCode],
   7:     [AverageRate],
   8:     [EndOfDayRate],
   9:     [ModifiedDate]
  10: )
  11: select [CurrencyRateID],
  12:     [CurrencyRateDate],
  13:     [FromCurrencyCode],
  14:     [ToCurrencyCode],
  15:     [AverageRate],
  16:     [EndOfDayRate],
  17:     [ModifiedDate] from Sales.CurrencyRate

This is the execution plan created for the insert:

PVInsert1

As you see, database engine split data amongst all partition tables according to the partitioning column constraints. This way, you can easily move data from one table to several smaller partitions. Of course, you wouldn’t want to run simple insert statement as above for 135 million rows, but you get the idea.

Partitioned views can be very useful, but apart from the conditions that I listed earlier in this article, there are some catches that might bite you and you have to carefully ponder on your design if you want to make them updateable.

These are some of these gotchas:

Tables referenced in updateable partitioned views cannot have identity column. If they have, you will get following error message:

Msg 4433, Level 16, State 4, Line 3
Cannot INSERT into partitioned view 'AdventureWorks.Sales.vCurrencyRate' because table '[AdventureWorks].[Sales].[CurrencyRate_2001]' has an IDENTITY constraint.

Tables cannot have computed columns if you want to have updateable partitioned view. If you try to insert rows into view on tables that have such columns, the error message is following:

Msg 4406, Level 16, State 1, Line 1
Update or insert of view or function 'Sales.vSalesOrderHeader' failed because it contains a derived or constant field.

The partitioning column has to be defined using equality or inequality operators. For example, you cannot use datepart(year, …) to defined date range for the column, you have to use either BETWEEN.. AND.. or >=.. < operator. Otherwise the error message will be as follows:

Msg 4436, Level 16, State 12, Line 3
UNION ALL view 'AdventureWorks.Sales.vCurrencyRate' is not updatable because a partitioning column was not found.

 

 

The partitioned views do not have to be updateable. You can design tables with computed columns, IDENTITY columns and still use partitioned views to offload certain queries. One of the scenarios is for example daily upload of data. You can create a staging table which will accept the data and after ETL is finished, you alter view definition to include new table and possibly exclude one with older data.

I attach the script for your convenience. You can run it on AdventureWorks database or use it as a template for your code.

Published Saturday, September 11, 2010 2:27 AM by Piotr Rodak

Attachment(s): partitioned views.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

omar said:

thanks! awesome demonstration of partitioned view.

June 27, 2011 9:03 PM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement