While Adam was living it up at VSLive in New York last week, I was consulting in Atlanta, trying to find way to make an application 'run faster'. I won't go into the gory details of everything I looked at, but one there is one aspect of the system I'll tell you about.
When the company's DBA mentioned that one of the big reports they were running had to mask the Social Security Number in all the data rows as they were returned, I heard some warning bells go off in my head. We then started a server-side trace and ran the job, which took several hours and filled 4 50MB trace files. While I was working on a different problem, the DBA took a look at the trace files. He then told me that the first of the four files was the only one I needed to look at because the other 3 were filled with nothing by calls to the function that was masking the numbers. All the other work done in the 4 hour report was contained in first 10 percent of the first file.
I remembered tracing functions at one time long ago, maybe when I was writing the Inside SQL Server 2000 book. But I thought it was time to do it again. I took the pubs database and created a copy of the some of the data in the authors table, which includes a Social Security number called au_id, and then copied the table into itself enough times to get about 100,000 rows. I then created a simplified version of the masking function that my client had been using.
To test the cost of the function, I first selected from the table applying the underlying function code to the SSN column. I then selected from the table, actually calling the function in each row.
When I traced only the event TSQL:BatchCompleted, I was amazed to see that when using the function the CPU cost and the duration were more than TWICE as high as the values when just using the underlying code. The reads were about the same, as expected, since both statements were accessing the same data.
I expected the cost to be higher, but not twice as high. But I was still to be truly astounded. When I then changed the trace definition to trace every statement completed in a procedure (SP:StmtCompleted) the cost of the query with the function call was vastly higher. Every call to the function was treated as a call to a procedure, and traced as a statement in a procedure. The CPU was 5 times higher and the duration increased by a factor of 30!
So, if one warning I can give you is to be careful when using scalar functions on a large result set, the more important warning here seems to be that you should be very careful what you trace. The act of tracing can impose a performance penalty that can sometimes mask the problem you're actually trying to track down.
I am attaching a script file that includes the code for building the new big table, and the code for the masking function so you can try it and see the performance on your own SQL Server. (The blog software won't let me upload a .sql file, so I have just given it a .txt extension.)
-- Kalen