THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Included Columns

When an issue comes up more than once in a short period of time, I figure it is a message from the Universe. I might even assume it is a gift of blog fodder. So I won't turn down the gift, even if it means another juicy post before the end of year, much to Denis' disappointment.

The issue is SQL Server 2005 "Included Columns". There was a post on the public newsgroups about them a couple of days ago, and just yesterday, there was a similar post on the SQL Server MVP private newsgroup.  Simply put, the questions were expressing concern that there were no built-in tools in SQL Server 2005 to list which columns were included columns. The information is of course available in the metadata views, but it can require a join of at least three of them to get a simple list of which columns are "included columns" in a given index.  The poster on the MVP newsgroup wanted to know if someone had already written such a query and could share it, to save him some work.

There actually was a third post about included columns that I came across,  that was asking about the reasons for using included columns. I will address that question first, and then provide some scripts to return information about included columns.

You need to be aware of two 'features' of indexes to fully appreciate included columns.

First: all indexes have a limit of no more than 16 key columns, with a combined total of no more than 900 bytes.

Second: a 'covering' index, which is a nonclustered index that contains all the columns referenced in a query from one table, can provide an incredible performance advantage. If all the information a query needs is contained in the index keys, SQL Server will never need to actually access the table data, and not having to do this table lookup can be a very good thing. (Disclaimer: covering indexes are not the solution to ALL query performance problems, and I don't have time today to provide a full discussion of covering indexes.)

SQL Server 2005 allows you to get around the 16 column and 900 byte limit and add additional column to a nonclustered index to provide greater opportunity for covering indexes. The syntax would look something like this:

CREATE INDEX bigindex on mybigrowtable(keycolumn) INCLUDE (bigcolumn1, bigcolumn2)

If all the columns you would like to have in your index fit within the limits, there is no technical reason for having included columns; all your columns could just be regular key columns. However, there are some other issues to consider.

1) Space requirements: Normal key columns, as part of the key, are propagated up through all levels of your indexes. So if you have some very large columns, even if they fit in the 900 byte limit, you can save space by defining them as included columns.  Here is an example, using the AdventureWorks database:

USE AdventureWorks

-- set the database to bulk_logged recovery
--  prior to copying tables

ALTER DATABASE AdventureWorks
    SET RECOVERY bulk_logged
GO
---------------------
IF ( OBJECT_ID('Person.Address1') is not null)
    DROP TABLE Person.Address1
GO

IF ( OBJECT_ID('Person.Address2') is not null)
    DROP TABLE Person.Address2

--------------------------------------

-- Create two copies of the Person.Address table and enlarge one of the columns

SELECT * INTO Person.Address1
    FROM Person.Address
GO
ALTER TABLE Person.Address1
    ALTER COLUMN AddressLine2 nchar(350)
GO

-- Create an index with two included columns

CREATE INDEX IX_Address_City
    on [Person].[Address1] (City, StateProvinceID)
    INCLUDE(AddressLine1, AddressLine2)
GO
-------------------

SELECT * INTO Person.Address2
    FROM Person.Address
GO
ALTER TABLE Person.Address2
    ALTER COLUMN AddressLine2 nchar(350)
GO
-- Create a 'regular' index with no included columns;
-- All four columns are keys, and included at all index levels
CREATE INDEX IX_Address_City
    on [Person].[Address2]
        (City, StateProvinceID, AddressLine1, AddressLine2)
GO

-- Note that the Address2 index uses about 1.5 MB
-- more than  the index on Address1, because it contains
-- all 4 keys in all levels of the index.

EXEC sp_spaceused 'Person.Address1'
EXEC sp_spaceused 'Person.Address2'
GO

2) Maintenance costs: Normal key columns are maintained in sorted order. An index on (lastname, state, city) would have 3 sort columns and every new row would have to be put in the proper location based on all 3 columns. But an index on lastname, with (state, city) as included columns, would not have to maintain sort order on state and city and that could improve the performance of data modification operations. I haven't done any tests yet to determine how much savings you might realize, but it will be faster with included columns.

The downside of included columns not being sorted, is that those columns will not be useful for searching if those columns are in the WHERE clause, in particular if those columns are using in an inequality expression.

So now, how can you retrieve information about which columns are included and which are key columns?  Sp_helpindex does not display this information, in fact, it makes no mention of included columns. The information is available in a catalog view called sys.index_columns, in a column called 'is_included_column'. The code below creates a view that you can use to return information about all the columns in your indexes.

CREATE VIEW index_column_info
AS
  SELECT object_name = object_name(ic.object_id),
                  index_name = i.name,
                  'column' = c.name,
                  'column usage' = CASE ic.is_included_column
                               WHEN 0 then 'KEY'
                               ELSE 'INCLUDED'
                   END
   FROM sys.index_columns ic JOIN sys.columns c
              ON ic.object_id = c.object_id
              AND ic.column_id = c.column_id
       JOIN sys.indexes i
              ON i.object_id = ic.object_id
              AND i.index_id = ic.index_id

You can select from this view and supply a table name and/or an index name or id in a WHERE clause:

SELECT * FROM index_column_info
WHERE object_name = 'Address1'

A student in one of my classes earlier this year wanted a way to get input back that looked like the sp_helpindex output. I am attaching a script to build a stored procedure called new_helpindex which will do that. However, the new procedure is used a bit differently than sp_helpindex. sp_helpindex requires a schema name and new_helpindex does not. So to get information about my new Address1 table created above, the two procedures would be called as shown:

EXEC new_helpindex 'address1'
EXEC sp_helpindex 'person.address1'

If you want the new_helpindex procedure to accept schema names, and to return exactly the same details as sp_helpindex (plus the included columns), it is certainly possible, and I leave that as an exercise to my readers.

Have fun!

~Kalen

Published Monday, December 31, 2007 12:53 PM by Kalen Delaney

Attachment(s): new_helpindex.txt

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dimitri Furman said:

Kalen,

Thanks for a very informative post.

It appears there is a typo in the second paragraph, where computed columns are mentioned instead of included columns.

January 2, 2008 3:17 PM
 

Kalen Delaney said:

Thanks Dimitri -- I think I've fixed it now.

Interestingly enough, there were similar problems with computed columns, in that earlier versions had no good way to see the definition of a computed column. sp_help would tell you which columns were computed, but the actual definition was tricky to dig up.

Thanks!

Kalen

January 2, 2008 3:35 PM
 

Alex Kuznetsov said:

Kalen,

That was an intersting post. I came across two more potential uses of included columns:

1. Suppose you want you NCI index cover a query. Suppose you want to make sure that a column, let's say CustomerID, is stored in the index.

Even if your table is currently clustered on CustomerID, it is still a good practice to explicitly include CustomerID in your NCI. The reason is simple: sometimes you can drop the CI and build another CI on another column(s). Your index should still store CustomerID, so that it still covers your query.

2. Suppose you have a table:

create table a(email_alias varchar(20), full_name varchar(20), /*many other columns*/)

Suppose you need to make sure email_alias are unique. Obviously you will create a unique index on email_alias. Suppose you also need an index on (email_alias, full_name) to cover several frequently run queries. In SQL Server 2005 you can have one and the same index accomplish both goals. Use a new INCLUDE option in CREATE INDEX statement:

create unique index a1 on a(email_alias) include(full_name)

Both columns will be stored in the index, which will guarantee index covering, but the uniqueness of email_alias will be also preserved:

--- succeeds

insert into a values('jsmith', 'Jack Smith')

go

--- fails

insert into a values('jsmith', 'Jared Smith')

Msg 2601, Level 14, State 1, Line 1

Cannot insert duplicate key row in object 'dbo.a' with unique index 'a1'.

The statement has been terminated.

It is very important to keep the number of indexes as low as possible, and the new INCLUDE option comes very handy in accomplishing that goal.

January 3, 2008 10:24 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement