I showed why T-SQL scalar user-defined functions are bad for performance in two previous posts. In this post, I will show that CLR scalar user-defined functions are bad as well (though not always quite as bad as T-SQL scalar user-defined functions).

I will admit that I had not really planned to cover CLR in this series. But shortly after publishing the first part, I received an email from Adam Machanic, which basically said that I should make clear that the information in that post does not apply to CLR functions. So I dutifully added a comment to that post, and included a similar disclaimer in the second part – but I also planned to run some tests, for it is my strong belief that you should never take anything for granted, no matter how knowledgeable the person telling you it is. And if I am running those tests anyway, why not share my findings with you?

**No data access**

The first part of this post focused on scalar user-defined functions (UDFs) that don’t include any data access; only computations, string handling, etc. I used a very unrepresentative example: multiplying an integer value by 3. Here is a CLR equivalent (using C#) of that function:

[Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.None,

IsDeterministic=true,SystemDataAccess=SystemDataAccessKind.None,IsPrecise=true)]

public static SqlInt32 CLRTriple(SqlInt32
Value)

{

return
Value * 3;

}

In order to test the performance of this version of the function and compare it to the alternatives (T-SQL function and inline logic), I ran this batch against the LargeTable table, which is still populated with ten million rows.

SET STATISTICS TIME ON;

SELECT MAX(dbo.Triple(DataVal)) AS MaxTriple

FROM dbo.LargeTable;

SELECT MAX(dbo.CLRTriple(DataVal)) AS MaxTriple

FROM dbo.LargeTable;

SELECT MAX(3 * DataVal) AS MaxTriple

FROM dbo.LargeTable;

SET STATISTICS TIME OFF;

The execution plans immediately show a huge advantage of CLR UDFs over their T-SQL counterparts: they don’t inhibit parallelism!

But other than that, this execution plan is doing pretty much the same as all other execution plans in this series of blog posts: lying until it is black in its face. Because that is what execution plans do when UDFs are involved, and CLR functions are no exception. For the real comparison, I ran the same batch without outputting the execution plan (to eliminate their overhead) and checked the amount of CPU time used and the time elapsed. For the T-SQL function, CPU time was 35,709 ms, and elapsed was 38,621 ms. The CLR version used less CPU time, only 13,072 ms – a huge saving. And thanks to the parallelism, the saving in elapsed time was even more: only 2,071 ms left.

However, the version that avoids any UDF and instead places the logic inline still wins, with a CPU time of only 5,741 ms, and an elapsed time of 768 ms – almost three times as fast as the CLR version, and over 45 times as fast as the T-SQL version.

Given the total lack of complexity in the UDF, the huge performance difference between the CLR and T-SQL versions of the function cannot be explained by CLR code being faster than T-SQL code (though that is definitely the case – see below). The only explanation I can come up with is that invoking a T-SQL involves a lot more overhead than invoking a CLR function. However, the CLR function is still not entirely free of overhead, otherwise it should have performed about the same as the version with the logic inline.

**Complex calculations**

Up until now, the conclusion is that, even though scalar CLR functions definitely perform much better than scalar T-SQL functions, the best performance is still achieved by placing the logic inline. But there are exceptions to this rule. As already mentioned above, CLR code executes a lot faster than T-SQL code – so if you have a function that performs extremely complicated logic, it may well be worth your time to do a thorough comparison between inlined logic and a CLR function.

Because I wanted to use a realistic example, I decided to dig up the code I wrote many years ago, when I needed to compute the distance between points on the earth on SQL Server 2005 (before spatial data types were introduced). The formula used to calculate the distance between two points, given their latitude and longitude, is as follows:

This formula assumes that the input and result are all measured in radians. The data set I used for testing has locations with latitude and longitude measured in degrees, and I prefer to see the distance reported in kilometers, so we’ll have to add some unit conversions to get the correct results.

I first implemented this calculation as a T-SQL scalar user-defined function:

CREATE FUNCTION dbo.Distance

(@Lat1 FLOAT,

@Lon1
FLOAT,

@Lat2
FLOAT,

@Lon2
FLOAT)

RETURNS FLOAT

AS

BEGIN;

DECLARE @Lat1R FLOAT,

@Lon1R FLOAT,

@Lat2R FLOAT,

@Lon2R FLOAT,

@DistR FLOAT,

@Dist FLOAT;

-- Convert from degrees to radians

SET @Lat1R = RADIANS(@Lat1);

SET @Lon1R = RADIANS(@Lon1);

SET @Lat2R = RADIANS(@Lat2);

SET @Lon2R = RADIANS(@Lon2);

-- Calculate the distance (in radians)

SET @DistR = 2 * ASIN(SQRT(POWER(SIN((@Lat1R - @Lat2R) / 2), 2)

+ (COS(@Lat1R) * COS(@Lat2R) * POWER(SIN((@Lon1R - @Lon2R) / 2), 2))));

-- Convert distance from radians to kilometers

-- Explanation: Distance in radians = distance in nautical miles * (pi /
(180 * 60)), so

-- distance in
nautical miles = distance in radians * 180 * 60 / pi

-- One nautical mile
is 1.852 kilometers, so

-- distance in km
= (distance in radians * 180 * 60 / pi) * 1.852

-- And since 180 * 60
* 1.852 = 20001.6, this can be simplified to

-- distance in km
= distance in radians * 20001.6 / pi

SET @Dist = @DistR * 20001.6 / PI();

RETURN @Dist;

END;

GO

To test its performance, I used a table that Microsoft included as part of the product samples with SQL Server 2005 (I believe these samples are still available on CodePlex). This table includes geographic data for almost 22,993 places in the United States of America. For my test, I chose to calculate the distance between each pair of places in the state Texas; since there are 1,276 places in this state, the function will be invoked 1,276 * 1,276 = 1,628,176 times. To make sure that my test is not influenced by the time to send results to the client or render them on my screen, I included the MAX function, so that only a single number is sent to the client.

SET STATISTICS TIME ON;

SELECT MAX(dbo.Distance(a.Lat, a.Lon, b.Lat, b.Lon))

FROM dbo.Place AS a

CROSS JOIN dbo.Place AS b

WHERE a.State = 'TX'

AND b.State = 'TX';

SET STATISTICS TIME OFF;

This query took over half a minute to finish. 31,449 ms CPU time, and 34,392 ms elapsed time. The time per execution of the user-defined function is decent enough (only about 21 ms), but because of the sheer number of executions, the total running time is still pretty long.

Before moving to the CLR version of the distance calculation, let’s first try what happens if I avoid the user-defined function and instead place the calculation inline. After all, this proved to be a useful technique in the first part of this series. For this query, the downside of inlining the logic is that it results in a pretty much unmaintainable query – but it does indeed result in an impressive performance boost!

SET STATISTICS TIME ON;

SELECT MAX(2 * ASIN(SQRT(POWER(SIN((RADIANS(a.Lat) - RADIANS(b.Lat)) / 2), 2)

+ (COS(RADIANS(a.Lat)) * COS(RADIANS(b.Lat))

* POWER(SIN((RADIANS(a.Lon) - RADIANS(b.Lon)) / 2), 2)

))) * 20001.6 / PI())

FROM dbo.Place AS a

CROSS JOIN dbo.Place AS b

WHERE a.State = 'TX'

AND b.State = 'TX';

SET STATISTICS TIME OFF;

The CPU time is way down, from over 31 seconds to less than 8: 7,956 ms. Elapsed time is down even more, because the optimizer can parallelize this plan – only 1,557 ms! That is a very nice reduction, but still no reason to stop. As the amount of data increases, even this version can run into performance problems. For instance, when I simply omitted the WHERE clause to make SQL Server find the maximum distance between any of the almost 23,000 places in the database (involving over 500 million distance computations), the query ran for almost five minutes, with an elapsed time of 282,580 ms, and a CPU time of 2,155,995 ms! That’s a good reason to try to optimize this further.

And that brings us back to the subject of this post: CLR scalar user-defined functions. I have implemented the same distance calculation as a C# function:

using System;

using System.Data;

using System.Data.SqlClient;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions

{

[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true)]

public static SqlDouble
DistCLR

(SqlDouble
Lat1, SqlDouble Lon1,

SqlDouble
Lat2, SqlDouble Lon2)

{

// This is
just a wrapper for the T-SQL interface.

// Call the
function that does the real work.

double
Dist = SpatialDist(Lat1.Value, Lon1.Value,

Lat2.Value,
Lon2.Value);

return new SqlDouble(Dist);

}

public static double
SpatialDist

(double
Lat1, double Lon1,

double
Lat2, double Lon2)

{

// Convert
degrees to radians

double
Lat1R = Lat1 * Math.PI / 180;

double
Lon1R = Lon1 * Math.PI / 180;

double
Lat2R = Lat2 * Math.PI / 180;

double
Lon2R = Lon2 * Math.PI / 180;

// Calculate
distance in radians

double
DistR =

2 * Math.Asin(Math.Sqrt(Math.Pow(Math.Sin((Lat1R - Lat2R) / 2), 2)

+ (Math.Cos(Lat1R)
* Math.Cos(Lat2R)

* Math.Pow(Math.Sin((Lon1R - Lon2R) / 2), 2))));

// Convert
from radians to kilometers

double
Dist = DistR * 20001.6 / Math.PI;

return
Dist;

}

}

The SQL for testing this is almost identical to the SQL for testing the T-SQL scalar user-defined function – only the function name is changed.

SET STATISTICS TIME ON;

SELECT MAX(dbo.DistCLR(a.Lat, a.Lon, b.Lat, b.Lon))

FROM dbo.Place AS a

CROSS JOIN dbo.Place AS b

WHERE a.State = 'TX'

AND b.State = 'TX';

SET STATISTICS TIME OFF;

The results show that this version is even faster than the T-SQL query with the computation inline, although only marginally. The CPU time is 7,798 ms, and the elapsed time is 1,459 ms. This means that the distance of the maximum calculation across all the places in the United States should also become slightly faster, and indeed it does – the elapsed time is 257,081 ms, and the CPU time is 2,010,774 ms. Still very slow, so if I had to tackle this problem for a real client I would start looking for other optimizations – but in the context of this blog post, I only wanted to show that using CLR scalar user-defined functions for complex computations can sometimes perform better than trying to do those computations in T-SQL only.

**Bottom line**

In the first two parts of this series, I have shown that T-SQL scalar user-defined functions, both with and without data access, are terrible for performance. In this part, I have looked at CLR scalar user-defined functions without data access. As a rule of thumb, those are bad for performance too – but unlike their T-SQL counterparts, when complex calculations are involved, the performance hit of having to invoke the function for each individual row processed by a query can sometimes be offset by the performance gain by moving the complex calculation to the faster CLR environment. If you consider embedding a complex calculation in a CLR scalar user-defined function, I recommend extended performance testing of both the UDF and the equivalent query with the same logic in inlined T-SQL.

Of course, performance is not always the only deciding factor. Maybe you already have the CLR function because it’s also called from your .Net code; in that case, you can save a lot of development and maintenance effort by reusing that component in your T-SQL code. Maybe you need to do something that is only possible from CLR code (like calling a web service – as long as you are aware that this will have a REALLY detrimental effect on performance!), or something that is simply so much easier in CLR code (like using standard methods for regular expression pattern matching, that would be extremely complicated to code in T-SQL code). Or maybe your UDF will only be called five times per day, and never in a query that processes more than a single row. At the end of the day, *you* will have to make your decision, based on all the factors that are relevant in your specific scenario.

The next part of this series will look at CLR scalar user-defined functions with data access; after that, we can finally start to cover the various kinds of table-valued user-defined functions available in SQL Server (the good and the ugly).

## Comments

## Adam Machanic said:

Nice post, Hugo!

## Jānis said:

T-SQL function is not determined as it don't have "With Schemabinding"..

## Alejandro Mesa said:

Nice series, Hugo!

Would you mind testing it one more time, but this time making the UDF determistic by adding the function option SCHEMABINDING, as Jānis suugested?

May be that is the cause for the marginal difference between both elapsed times.

Improving query plans with the SCHEMABINDING option on T-SQL UDFs

http://blogs.msdn.com/b/sqlprogrammability/archive/2006/05/12/596424.aspx

## Hugo Kornelis said:

Jānis and Alejandro make a great point. I had indeed overlooked the missing schemabinding, that makes the dbo.Distance function non-deterministic. To check the effect, I created another function, with the same body but "WITH SCHEMABINDING" added. Then I tested both versions repeatedly.

The first thing that struck me was that the results appear to fluctuate. Many tests (for either of the functions) took about 15 seconds, but every so often, suddenly a few test runs would jump to around 30 seconds. That would then last for two or three tests before falling back to the normal 15-second time.

So I ran a lot of iterations of the two tests. 56, to be precise. Then, I tallied all results. For the original (not schembound, not deterministic) version, the average CPU time was 16.83 seconds, and the average elapsed time was 18.36 second. For the schemabound and deterministic alternative, those numbers were 16.89 and 18.39, so the schemabound version appears to be a tad slower. However, this difference is so small that I think it is just statistical error.

For the record, the minimum CPU and elapsed time for the functions were 12.78, 14.06, 12.64 and 13.99; and the maximum values were 31.79, 34.84, 32.93 and 35.89. Just to indicate how wide apart some observation were.

## Paul White said:

Hi Hugo,

Regarding the sentence, "The only explanation I can come up with is that invoking a T-SQL involves a lot more overhead than invoking a CLR function.". See the Books Online :Performance of CLR Integration" topic (http://msdn.microsoft.com/en-us/library/ms131075.aspx).

There's some useful information there, including the statement, "CLR functions benefit from a quicker invocation path than that of Transact-SQL user-defined functions.".

SQLCLR scalar functions without data access are indeed highly preferable to T-SQL scalar functions, and can often perform better than native T-SQL code (without function calls) for surprisingly simple activities. One recent example that springs to mind was implementing DATEFROMPARTS in CLR versus a bunch of nested T-SQL functions.

I do think your summary statement, "As a rule of thumb, those [CLR scalar functions] are bad for performance too" is over-stated. Avoiding a function altogether (or using an in-line TVF) will often out-perform a 'pointless' CLR scalar function, but failing to apply common sense should not really result in something as strong as a "rule of thumb", should it?

Paul

## Robert Folkerts said:

What on Earth does, "lying until it is black in its face" mean? If I lie a great deal, I may turn red in the face from a lack of breathing or embarrassment, but how do you lie until you are black in the face?

## greg elliott said:

you need to add "with schemabinding" to your function definitions. otherwise there is a huge overhead in each call, as SQL checks to see if the function definition has changed.

It has been our experience that this change removes the overhead.

## greg elliott said:

so the schemabound version appears to be a tad slower.

===============

we have not seen this in our testing. quite the opposite, with dramatic improvements after adding with schemabinidng.