THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Increment a variable in the SET clause of an UPDATE statement [T-SQL]

There’s a very nifty feature of T-SQL that, in my experience, not everyone knows about but is handy for generating surrogate key values as part of an ETL routine. I myself had to ask on Twitter what the syntax was because I couldn’t remember it (thanks to Matija Lah for reminding me) so I think its worth blogging in case I forget it again.

To summarize the feature, you can use a variable in the expression of the SET clause in an UPDATE statement while updating it at the same time. The syntax is:

@variable = column = expression

where expression can include @variable.

It will probably crystalise in your brain if you see it in action, so here is a little demo:

/*Create a table and populate with data*/
DROP TABLE [#DimProduct]; --will fail first time you run the script, so just run it again
CREATE TABLE [#DimProduct] ([ProductId] INT, [ProductCode] VARCHAR(10));
DECLARE @i INT = 0;
INSERT [#DimProduct]([ProductId],[ProductCode])VALUES(1,'qwerty'); /*Simulating an existing record*/
WHILE @i <10
BEGIN
  
/*Simulating some new rows for which we want to generate a a surrogate key*/
  
INSERT [#DimProduct]([ProductCode])VALUES(REPLACE('qwerty@i','@i',@i));
  
SET @i += 1;
END
/*Take a look at the data*/
SELECT * FROM [#DimProduct]

/*Generate surrogate keys using an UPDATE*/
DECLARE    @MaxSurrogateKey    INT = (
              
SELECT  ISNULL(MAX([ProductId]),0)
              
FROM    [#DimProduct]
);
UPDATE [#DimProduct]
SET     @MaxSurrogateKey = [ProductId] = @MaxSurrogateKey + 1 /****The clever bit****/
WHERE  [ProductId] IS NULL;

/*Take another look at the data*/
SELECT * FROM [#DimProduct]

Here’s the output:

image

Notice that our UPDATE statement has generated contiguous surrogate key values for all rows that previously had a NULL surrogate key. Pretty neat huh? Well, I think so anyway!

@jamiet

Published Tuesday, May 28, 2013 1:24 PM by jamiet
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dave Ballantyne said:

Though I feel that I may been comparing chalk with cheese a bit ...

There are some caveats around this:

A) Concurrency,  since the Select and Update are now two distinct operations you could get duplicate values if there are multiple concurrent tasks

B) Non- guaranteed order.  Much like the 'quirky-update' update order is not guaranteed so any old order *could* happen.

C) Not sure what this gives over an identity for example.

May 28, 2013 9:44 AM
 

jamiet said:

Dave,

Thanks for the comment.

Quite right about concurrency, this is intended to be used in an ETL routine and in such case concurrency won't be an issue.

Non-guaranteed order - I don't care about the order. These are meaningless identifiers, thus the order is meaningless as well. I'm always wary of people who derive meaning from the order of a surrogate key - treat it what it is - a meaningless identifier.

Advantage over an identity - if identities were sufficient for all use cases then we wouldn't have sequences in SQL2012. I personally dislike identities so I like that there is another option for people that are not on SQL2012.

cheers

JT

May 28, 2013 9:56 AM
 

Adam Machanic said:

Dave: This is nothing more than the "quirky update" packaged up in slightly different clothes, isn't it?

Jamie: "If identities were sufficient for all use cases then we wouldn't have sequences?" Wrong. If Oracle didn't have sequences then we wouldn't have sequences.

I'm not sure why you wouldn't use a ROW_NUMBER here. It would be supported, documented, and deterministic. Clever is not always a good thing.

--Adam

May 28, 2013 10:07 AM
 

M A Srinivas said:

This type of updneate I used to do in 2000 . Not New

May 29, 2013 5:20 AM
 

jamiet said:

When and where did I imply that it was new?

May 29, 2013 5:25 AM
 

David Howard said:

Nice post Jamie.  If you don't care about determinism, then this seems like a nice approach.  ROW_NUMBER would be necessarily more complex since it can't be used directly in a SET statement.  Also, this functionality is documented in Books Online, so I'm not sure why it wouldn't be supported. It's probably not as "supportable" though, just because I'm guessing most developers haven't heard of it.

May 29, 2013 12:39 PM
 

Adam Machanic said:

@David: The syntax is documented, but only for single-row cases. The behavior for multirow sets is not documented, not guaranteed, and not supported. It's considered to be a side-effect.

ROW_NUMBER would be slightly more complex syntax wise. But barely. An extra set of parenthesis is not exactly a big deal in my humble opinion.

--Adam

May 29, 2013 1:58 PM
 

Ian Yates said:

I recall Jeff Moden from SQL Server Central promoting this sort of update a while ago.  See http://www.sqlservercentral.com/articles/T-SQL/68467/.  There are various caveats to do with clustered indices and partitioned tables (IIRC).  Nice trick, although I agree with Adam - I can't see myself using this.  However, I do really appreciate your blog.  Keep 'em coming :)

May 29, 2013 11:20 PM
 

Marc Brooks said:

@Adam:  Is this what you were expecting?

;WITH DP

AS (SELECT [ProductID], ROW_NUMBER() OVER(ORDER BY [ProductCode]) AS RN FROM [#DimProduct])

UPDATE DP

SET    [ProductId] = RN

May 30, 2013 4:10 PM
 

Adam Machanic said:

@Marc: Something like that. You'd still need to add in the base number, as Jamie did.

June 12, 2013 12:50 PM
 

Timo said:

Thanks for this. Just what I was looking/googling for.

November 12, 2013 7:08 AM
 

premkumar said:

excellent work

February 6, 2014 12:08 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement