THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kevin Kline

Why Do I Keep Seeing This Mistake?

One of the fundamentals of loop optimization is that you should move stable operations outside of the loop.  What I want to know is - if this is such a fundamental rule, why do so many people break it?!?

If you are familiar with other programming languages, then you are probably aware of loop optimization techniques. You should try to put all operations outside of the loop if they don’t need to change within the loop. This reduces the amount of unnecessary repetitive work. SQL Server optimizer doesn’t automatically recognize such inefficiencies and clean the code for you (compilers of some other languages do). You have to write efficient loops yourself as in the following example.

These scripts print a table of square roots for all numbers from 1 to 100.  Notice the boldfaced code below:

-- Loop with code inside = inefficient

SET NOCOUNT ON
DECLARE @message VARCHAR(25),  @counter SMALLINT
SELECT  @counter = 0
WHILE @counter < 100
BEGIN
SET @counter = @counter + 1
SET @message = REPLICATE( '-', 25 )
PRINT  @message
SET @message = str( @counter, 10 ) + str( SQRT( CONVERT( FLOAT, @counter ) ), 10, 4 )
PRINT  @message
END

-- Elapsed time: 376 ms

Compare the above Transact-SQL script to the one below, where the boldfaced code is moved outside of the loop:

-- Loop with code outside = efficient

SET NOCOUNT ON
DECLARE @separator VARCHAR(25), @message   VARCHAR(25), @counter   SMALLINT
SELECT  @counter = 0, @separator = REPLICATE( '-', 25 )
WHILE @counter < 100
BEGIN
SET @counter = @counter + 1
PRINT  @separator
SET @message = Str( @counter, 10 ) + Str( SQRT( CONVERT( FLOAT, @counter ) ), 10, 4 )
PRINT  @message
END

-- Elapsed time: 36 ms

The second script executes the REPLICATE( ‘-‘, 25 ) function only once, compared to 100 times in the first script. Results produced by both scripts are identical:

-------------------------
1    1.0000
-------------------------
2    1.4142
-------------------------
3    1.7321
-------------------------
4    2.0000
. . .
. . .
. . .
-------------------------
99    9.9499
-------------------------
100   10.0000


Of course, there are a million and one ways to perform any such algorithm.  But I'm still surprised that otherwise experienced and competent database programmers are still embedding very stable elements of their code inside of extensive looping operations rather than outside of them.  Thoughts?

Cheers,

-Kev

 Twitter @ kekline

Published Friday, July 10, 2009 1:31 PM by KKline
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Mike C said:

I really enjoy seeing variable declarations inside the loop, since BEGIN...END doesn't change the scope for variables:

BEGIN

 DECLARE @message VARCHAR(25);

 SET @message = REPLICATE( '-', 25 )

END

SET @message = 'abcdefg';

The declarations don't really affect performance much, but the code is just plain confusing to anyone coming from any other language on the planet!

July 10, 2009 4:30 PM
 

KKline said:

Hi Mike, help me out here.  You're saying that the variable declaration isn't affected by the BEGIN..END, so why not put them inside a WHILE loop.  Is that correct?

Perhaps I'm just confused, but it seems like the test results in the example would stand on their own, as well as the logical assumption that doing the REPLICATE 100 times will have a much different performance profile than doing it only 1 time.

And I'm not sure if you're saying, in your final statement, that -this- example is terrible & confusing or that Transact-SQL is terrible & confusing?

Could you spell it out for me?

Thanks,

-Kev

July 10, 2009 6:43 PM
 

Madhivanan said:

The code

SET  @counter = 0, @separator = REPLICATE( '-', 25 )

should be

SELECT @counter = 0, @separator = REPLICATE( '-', 25 )

July 11, 2009 9:08 AM
 

RBarryYoung said:

Frankly, I am at a loss to explain why you would use loops at all for this, when it can easily be done in a set-based manner:

--====== SET-Based:

SET NOCOUNT ON

;WITH cteNumbers as (

SELECT TOP(100)

ROW_NUMBER() OVER(ORDER BY object_id) as N

FROM sys.system_columns

ORDER BY object_id

)

SELECT Str( N, 10 ) + Str( SQRT( CONVERT( FLOAT, N ) ), 10, 4 )

FROM cteNumbers

WHERE N <= 100

Or, if you really need the separator lines:

--====== SET-Based w separators:

SET NOCOUNT ON

DECLARE @separator VARCHAR(25)

SET  @separator = REPLICATE( '-', 25 )

;WITH cteNumbers as (

SELECT TOP(200)

ROW_NUMBER() OVER(ORDER BY object_id) as N

FROM sys.system_columns

ORDER BY object_id

)

SELECT

CASE WHEN N%2=1 THEN @separator

ELSE Str( N/2, 10 ) + Str( SQRT( CONVERT( FLOAT, N/2 ) ), 10, 4 )

END

FROM cteNumbers

WHERE N <= 200

select CONVERT(varchar(28), getdate(), 121) as [End Set2]

Both of these take 0 msec on my system, so that seems like an even better optimization.

July 11, 2009 12:18 PM
 

KKline said:

Barry, dude, you're missing the point.  The code was simply an -EXAMPLE-, as in an instance serving to illustrate a rule or method; a specimen.  The point was that coders sometimes put stupid stuff inside of a loop where it repeats zillions of times rather than keeping it outside of the loop where it is performed only once, with the same net effect.  I still see coders doing this all the time.

And yes there are many effective ways to do set based coding of the same problem.  Good job on pointing your solution out.  I actually documented another method, in my now ancient book "Transact-SQL Programming", that did the same thing without any variables.

Any guesses at that approach?

BTW, my no-variables approach, didn't perform as well as your code.  So excellent job!

July 11, 2009 6:48 PM
 

KKline said:

Madhivanan,

SET works just fine, even on SQL2008.  It's just old syntax.  There's nothing wrong with it or flawed, imo...

-Kevin

July 11, 2009 6:49 PM
 

Alexander Kuznetsov said:

Hi Kevin,

It does not work for me:

DECLARE @i INT, @j INT;

SET @i=1, @j=2;

Msg 102, Level 15, State 1, Line 2

Incorrect syntax near ','.

SQL 2005

July 11, 2009 8:10 PM
 

Stuart Ainsworth said:

Hey Kevin,

Madhivanan and Alexander are right; it's a typo in your second example.  You can't use SET to assign values to multiple variables in T-SQL (at least not since SQL 7).

As to why experienced programmers make this mistake, I think Barry's response is an indicator; good T-SQL programmers are unaccustomed to working with loops.  They try to find ways around them as best they can, and when they succumb to using a looping method, it's not easy to see the obvious errors.  Forest for the trees sort of thing.  At least that's my opinion.

Good post.

Stu

July 11, 2009 10:11 PM
 

Mike C said:

I'm saying the fact that BEGIN...END doesn't define a scope for variables can be extremely confusing, especially when you're coming from a language with code-grouping constructs that do define scope ("{"..."}" in C-style languages, for instance).  I wouldn't expect variable declarations to affect performance by any noticeable amount, but the fact that you can put variable declarations in a BEGIN...END construct and access them outside of the BEGIN...END can make for some confusing code.

July 12, 2009 9:30 AM
 

KKline said:

Excellent point, Mike!  (And thanks for explaining.)

In fact, Transact-SQL has several other problem areas of scope that are very similar.  For example, in Oracle and on a couple other DBMS'es, you can create nested transactions.  SQL Server allows you to create what -appears- to be a nested transaction, but in fact is not nested.  

July 12, 2009 1:05 PM
 

Phong Ha said:

You have my vote on the set based solution; however, may I add, since we are on the topic of best programming practices, that we create a table of numbers for computational purposes such as this, instead of going to the sys.system_columns for an object_id, which has no relevance to the solution at hand.

July 12, 2009 3:43 PM
 

KKline said:

Phong, The use of sys.system_columns is a simple shortcut since it provides a ready-made table with numeric values.

Your other suggesion, however, is an excellent one - worthy of an entirely separate blogpost.  =^)

Best regards,

-Kev

July 13, 2009 12:11 AM
 

daveballantyne said:

Leading on from what Mike said about scope , try this.

This had me scratching my head for a while on a bug

declare @LoopCount integer

Select @LoopCount = 0

while(@LoopCount <10)

begin

  Declare @TestTab table(

     col integer not null

  )

  insert into @TestTab values(@LoopCount)

  Select * from @TestTab

  Select @LoopCount = @LoopCount+1

end

Back to you main point , though , long gone is the thought that "Every cycle is sacred".  More prevelant is the attitude that the computer is so powerful you dont have to worry about it.

Still it seperates the men from the boys <G>

July 13, 2009 4:44 AM
 

Jeff Moden said:

Although I understand that the original intent of this post was to explain how to optimize loops a bit, I'm with Barry... the best way to optimize a loop is to get rid of the loop.  Then you don't have to worry about errors like using SET for more than one variable assignement ;-)

Here's the set based version complete with separator lines...

SELECT '-------------' + CHAR(10)

     + STR(t.Number,3,0)

     + STR(SQRT(t.Number),10,4) AS [Square roots for 1 to 100]

  FROM Master.dbo.spt_Values t --A Tally table would work here

 WHERE t.Number BETWEEN 1 and 100

   AND t.Type = 'P'

Also, sys.SysColumns isn't being used as a source of "ready made numbers" in Barry's code.  sys.SysColumns is nothing more than a source of rows for the Row_Number to work against.  There is no guarantee that the numbers in sys.SysColumns will be contiguous and that's why the Row_Number was needed.  It's a time honored way of genning a Tally table on the fly.

As to why people continue to make the mistake of including what should be a static operation in a loop, consider this... people still use loops for such simple things... did you expect them to actually be able to optimize the loop when they can't avoid the loop to begin with? ;-)

July 18, 2009 6:46 PM
 

Charles Kincaid said:

I agree that the example uses a loop when a set based solution is entirely better.  That's not Kevin's point.  Having redundant initialization inside a loop, any loop, is a mistake.  It happens in all languages.

Could be cust-and-paste error or not knowing the scope rule for you language.  Still Kevin has a good point and is getting chewed for choosing an iconic example.

July 20, 2009 10:59 AM
 

KKline said:

Thank you Charles!  =^D

July 20, 2009 11:34 PM
 

Peter said:

I do agree ditchig the loop is the better solution.

But I also recognise the fact that the compiler should be smarter and simple be able to do those 'place out of loop', 'vanish loop' operations by itself. Just like C/C++ compilers are able to do, I also very much dislike the current query cache plan implementation and lack of control over the optimizer. It can be made SO much better.

For pointers follow what is done in compilers and in processors branch prediction and caching logic. If it can't be quickly done in the first pass, have a low priority background thread optimize it several more passed, like hotspot optimizations do for JIT compilers.

If a batch or statement is executed 1000 times over a day, surely spending a bit of time optimizing in the background instead of using the initial plan would be preferable and help overal performance. There is nothing magic to it, the functional meaning however bad written is fixed and thus must be transformable to better code.

If we humans can do it, the CPU can do it better with the insane logic capacities and speed they have these days. This is no excuse to write code that does not express function clearly, but today its often the other way around. People having to test several solutions in order to pick the one the optimizer handles best. Experience helps in getting it done faster, but it is still an excersise that distracts from the real thing, writing functional clear and precise code and have the computer do the rest.

July 21, 2009 10:52 AM
 

KKline said:

Interesting idea, Peter.  I wonder why no one has "productized" this feature into an existing IDE?  

I know with our SQL Optimizer tool, which transforms a lot of poorly written SQL into better variations that return the same result set, you first have to start with a sort of transformation table that lays out which types of code patterns are equal to other types of code patterns.  For example, most all WHERE xxx IN (subquery) code patterns can be rewritten as WHERE EXISTS (subquery).  Sometimes the WHERE EXISTS performs better than the WHERE...IN.

Sounds like a lot of work, but an opportunity for someone.  =^)

Best regards,

-Kev

July 29, 2009 6:00 PM
 

KKline said:

Obrigado para este borne excelente do blogue! -Kevin

August 13, 2009 9:00 AM
 

Kevin Kline said:

Before I jump onto the Goals and Themeword meme started by my buddy, Thomas LaRock ( blog | twitter ),

January 5, 2010 11:42 PM

Leave a Comment

(required) 
(required) 
Submit

About KKline

Kevin Kline is a well-known database industry expert, author, and speaker. Kevin is a long-time Microsoft MVP and was one of the founders of PASS, www.sqlpass.org.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement