Did you know that SQL Server allows stored procedures to have up to 2100 parameters? And more important: do you care? Well, some people do care, and Joe Celko seems to be one of them.
If you are a regular reader of SQL Server newsgroups, you probably know Joe Celko from his always unfriendly and often incorrect replies. Here is a typical example, one that I have seen several times recently, in a paraphrased form:
Question: I want to send a list of values to my stored procedure, but WHERE ColumnName IN (@ValueList) does not work – how to solve this?
Answer: SQL Server can handle over 1000 parameters. You should use WHERE ColumnName IN (@Parm1, @Parm2, …, @ParmN).
Joe Celko is the only one I have ever seen giving this advise. Many people will then jump into the discussion, challenging Joe’s advise. To which Joe will always reply that he has received a smart stored procedure that will solve a Sudoku puzzle, and that takes 81 parameters (one for each cell in the puzzle) as its input – unfortunately, Joe has so far refused to actually publish his code to let other people verify his claims.
The test setup
I still wanted to see for myself how passing 81 parameters into a stored procedure compares to other methods of passing in the same input, so I wrote three simple test procedures. Each of these procedures takes a Sudoku puzzle as input, but in three different forms. Each of the three then uses the input to populate a temporary table (#Problem) with the puzzle, and then performs a pretty standard pivot query to output the puzzle in the usual form.
After verifying that all of the procedures worked as expected, I uncommented the pivot query to reduce the output for my performance tests. I then set the tests. I selected two real Sudoku puzzles (an easy one, with 34 cells given, and a hard one with only 27 cells given) and added two nonsensical ones of my own (one with only 5 cells given, and one with 72 cells). For each combination of a puzzle and a procedure, I coded a loop that calls the procedure a thousand times and records the elapsed time in a table. These twelve loops were than enclosed in an endless loop. Once satisfied with the code, I hit the execute button, pushed my laptop out of the way and went on to other stuff.
Some 24-odd hours later, I interrupted the test script. Each of the twelve “thousand calls” tests had been executed 400 times. I dusted of a query I originally wrote for testing the performance of code for the Spatial Data chapter in “Expert SQL Server 2005 Development” to calculate the average duration per single call, disregarding the fastest and slowest 10% of the measurements to exclude the influence of semi-random other activities on my laptop.
(Note that all code to create the stored procedures and run the tests is in the attachment to this post, so you can always repeat these tests on your machine.)
The contenders
The first contender is of course the procedure with 81 parameters that Joe Celko is such an avid fan of. Creating this procedure involved a lot of copying and pasting, a lot of editing numbers in the copied and pasted code, and a lot of tedious debugging until I had finally found and corrected all locations where I had goofed up the copy and paste or where I had failed to edit a number after pasting. The resulting code is very long, tedious to read and maintain, and screams “Hey! You forgot to normalize these repeating groups into their own table” all the way. Manually typing the EXEC statements to call this procedure with test data was also very cumbersome and error-prone. In a real situation, the procedure would probably be called from some user-friendly front end code. I’m far from an expert in front end code, but I expect this code to be very long as well, since it has to check and optionally fill and pass 81 parameters.
The second contender uses a pretty standard CSV string as input, with the additional requirement that each value in the CSV is three characters: row@, column#, value. The procedure uses a variation of one of the scripts found on Erland Sommarskog’s site to parse the CSV list into a tabular format. This code is lots shorter, and as a result easier on the eyes and easier to maintain. Typing in the EXEC statements for testing is still pretty cumbersome (though I found a way to cheat – simply copy the parameter list for the Celko version, do some global search and replace to remove various characters, and the end was exactly the string I needed to call this procedure). The front end code will probably be lots shorter, since it can use a simple loop to process the input and build the CSV parameter.
The third and last contender takes a CHAR(81) parameter as input. The first 9 characters of this parameter describe the top row, using a space to depict an empty cell; the second set of 9 characters is for the second row, and so forth. Parsing this parameter turned out to be even easier than parsing the CSV parameter. Another observation I made is that is was much easier to manually enter the parameter for the tests – just read the puzzle left to right and bottom to top and type either a number or a space for each cell. This was absolutely painless, and I didn’t make a single mistake. Of course, this is irrelevant for the expected real situation where the parameter is built by the front end – the code to do this will probably be about as complex as that for the CSV parameter.
Performance results
If you’re as eager to see the test results, you’ll probably have skipped the previous section. No problem, just promise to go back and read it later, m’kay?
|
Test version |
Joe Celko’s 81 parameters |
Single CSV parameter |
Single CHAR(81) parameter |
|
Almost empty (5 cells) |
1.08 ms |
1.40 ms |
1.05 ms |
|
Hard puzzle (27 cells) |
1.80 ms |
1.78 ms |
1.35 ms |
|
Easy puzzle (34 cells) |
2.04 ms |
1.90 ms |
1.45 ms |
|
Almost full (72 cells) |
3.34 ms |
2.56 ms |
1.99 ms |
As you see, using lots of parameters is faster than using a single CSV parameter only if you don’t actually pass values in these parameters. As soon as you use the parameters, performance of a procedure with lots of parameters deteriorates quickly.
You can also see that the CHAR(81) parameter wins in all cases.
Network bandwidth
My testing was all carried out on my laptop. The results will for the most part me a result of the time needed to process the input, not on network capacity. However, it is easy to see by just looking at the EXEC statements that the CHAR(81) version uses the least network resources, Celko’s version with 81 parameters uses the most, and the CSV versions sits nicely in between.
Final thoughts
You may have noted that I have not included a version with an XML input parameter in my tests. I probably should have done that, but I have to admit that I still have so much to learn on how to handle XML in a SQL Server database that I didn’t feel comfortable enough to sit down and write one myself. But your submissions are welcomed – if you feel that you can write an efficient version of this procedure that accepts its input in XML format, do not hesitate to write me. As soon as I can spare the time to set up the laptop for another all-nighter of performance testing, I’ll rerun the script with your XML solution included and post the results back to this site,
While writing this post, I found a newsgroup posting by Joe Celko where he reveals a snippet of “his” Sudoku solver. And guess what? I was wrong when I thought that I could guess how his procedure looks. It turns out that he does not use defaults for his parameter; you always have to supply them all, using 0 for an empty cell. I didn’t want to repeat all the tests at this time. I expect that this will reduce performance even more, though not by much – but it will also cause a huge increase in network usage!
I also saw that the parameters in Joe Celko’s version were declared as integer, so that each parameter will use 4 bytes instead of just 1. This will definitely affect both the performance of the procedure and the network pressure.
Conclusion
If you have to pass a long list of parameters to a stored procedure or function, you should not use a long collection of parameters. It makes the code harder to write and maintain, prone to subtle errors, longer (which will affect parse and compile time, though I did not include this in my test), uses far more network resources than any of the other alternatives (except, maybe, XML), and gets terribly slow as more parameters are actually used.
Joe celko will probably find that he too can shorten the amount of code in his Sudoku solver *and* increase performance by using a different strategy to pass the puzzle. Of course, in the case of solving a Sudoku, those two 2 milliseconds extra execution time won’t really matter, not the few hundred extra bytes travelling over the network. But if you ever encounter a similar multi-parameter problem in a procedure that will be called from a web page that will get hundreds of hits per second, those 2 milliseconds and those extra bytes in the network can suddenly become a huge bottleneck!
SQL Server may support up to 2100 parameters – but that does not imply that it is a good idea to actually use them!