THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

T-SQL User-Defined Functions: the good, the bad, and the ugly (part 4)

Scalar user-defined functions are bad for performance. I already showed that for T-SQL scalar user-defined functions without and with data access, and for most CLR scalar user-defined functions without data access, and in this blog post I will show that CLR scalar user-defined functions with data access fit into that picture.

 

First attempt

 

Sticking to my simplistic example of finding the triple of an integer value by reading it from a pre-populated lookup table and following the standard recommendations and templates that Microsoft publishes for CLR scalar table-valued functions, my first attempt for this function looked like this:

 

[Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.Read,

    IsDeterministic=true,SystemDataAccess=SystemDataAccessKind.None,IsPrecise=true)]

public static SqlInt32 CLRTripFromTbl1(SqlInt32 Value)

{

    SqlInt32 Result;

    using (SqlConnection conn = new SqlConnection("context connection=true"))

    {

        conn.Open();

 

        SqlCommand cmd = new SqlCommand("SELECT Triple FROM dbo.Triples WHERE Value = @Value;", conn);

        cmd.Parameters.Add(new SqlParameter("Value", Value));

 

        Result = (SqlInt32)((int)cmd.ExecuteScalar());

    }

 

    return Result;

}

 

Below is the code I used to test the performance of this version in comparison to the T-SQL scalar user-defined function and the version without UDF I used earlier (in part 2 – where I also explain why the query without UDF is more complex than needed). Note that I added a WHERE clause that limits the test to just 10% of the table (one million rows instead of all ten million) to limit the testing time.

 

SET STATISTICS TIME ON;

 

-- T-SQL UDF

SELECT MAX(dbo.TripFromTbl(DataVal)) AS MaxTriple

FROM   dbo.LargeTable

WHERE  KeyVal <= 1000000;

 

-- CLR UDF

SELECT MAX(dbo.CLRTripFromTbl1(DataVal)) AS MaxTriple

FROM   dbo.LargeTable

WHERE  KeyVal <= 1000000;

 

-- No UDF

WITH  TheTrips

AS   (SELECT (SELECT t.Triple

              FROM   dbo.Triples AS t

              WHERE  t.Value = l.DataVal) AS TheTrip

       FROM   dbo.LargeTable AS l

       WHERE  l.KeyVal <= 1000000)

SELECT MAX(TheTrip) AS MaxTriple

FROM   TheTrips;

 

SET STATISTICS TIME OFF;

 

If you run this and check the CPU and elapsed time, you’ll find a huge (and very nasty) surprise. I expected the CLR version to be about as bad as the T-SQL user-defined function with data access, but I was wrong – it is more than ten times slower! For processing just these one million rows (with the data already in cache), the version without UDF took about 0.2 seconds elapsed, 0.8 seconds CPU (the CPU time being higher than the elapsed time is because parallelism); the version with T-SQL UDF took about 26 seconds (elapsed and CPU – no parallelism in this execution plan) – and the CLR version took over 350 seconds! This is also the reason why I added the WHERE clause; without it, the CLR version would probably have taken more than half an hour to finish.

 

For completeness sake, I also checked the execution plan and the SET STATISTICS IO output of the CLR versions. All the issues I saw with the T-SQL scalar user-defined function with data access plague the CLR versions as well – no indication at all of the cost of the function in the execution plan, and no report of the amount of I/O in the SET STATISTICS IO output (though it is still visible in a profiler trace). Also, like the T-SQL scalar function with or without data access (but unlike the CLR scalar function without data access), you will never get a parallel plan on a query that invokes a CLR scalar user-defined function with data access. And, unlike a T-SQL UDF with data access, even the estimated execution plan will not show any hint of the data access performed in the function.

 

The parallel loopback misunderstanding

 

When I discussed these results with some people I consider more knowledgeable on CLR than myself, someone told me that by using a loopback connection instead of the context connection, it is possible to forgo the DataAccessKind.Read switch, which impedes parallelism.

 

Well – if I could give a price for the worst advice ever (okay, it later turned out to be the most misunderstood advice ever – more about that later), this would definitely qualify. I created and tested this UDF:

 

[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read,

IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]

public static SqlInt32 CLRTripFromTbl2(SqlInt32 Value)

{

    SqlInt32 Result;

    using (SqlConnection conn = new SqlConnection(@"Data Source=perFact\SQL2012; Initial Catalog=udf; Integrated Security=True;"))

    {

        conn.Open();

 

        SqlCommand cmd = new SqlCommand("SELECT Triple FROM dbo.Triples WHERE Value = @Value;", conn);

        cmd.Parameters.Add(new SqlParameter("Value", Value));

 

        Result = (SqlInt32)((int)cmd.ExecuteScalar());

    }

 

    return Result;

}

 

You may note that the DataAccessKind.Read switch is still present. That’s because, when testing this with DataAccessKind.None, I got a run-time error. After restoring the DataAccessKind.Read switch, I at least got this code to run – but not very fast! When testing this UDF, I had to reduce the amount of rows to process even further – processing just 1% of the table (a hundred thousand rows) took 168 seconds elapsed (though “only” 54 seconds CPU), so the one million row test I used earlier would have taken almost half an hour, and if I had used the full testset, the test would have run for almost 5 hours! That makes this test with the loopback connection almost five times slower than the one with the context connection – which was already about ten times slower than the T-SQL UDF (which, in turn, was well over a hundred times slower than the version without UDF).

 

It later turned out that I had misunderstood the advice. For the record, this advice was given by Adam Machanic, who is a true crack at everything CLR. It was not his advice that was bad; it was my ability to understand what he meant. He later clarified that his suggestion was to move the data access to a separate thread. He even pointed me to the library he created for this – see this link.

I have to be honest. I looked at the code. And looked at it again. Then I looked at the sample code. And again. And yet another time. And then I admitted defeat. My understanding of CLR is simply too limited to enable me to adapt me “triples” example to use this technique, so I will not be able to include it in performance comparisons. If anyone is willing to give it a try, then by all means do – and please let me know the results!

 

Cache as cache can

 

Another tip, also from Adam (and this time not misunderstood by me!), is to avoid multiple lookups of the same value by implementing a cache in the CLR code. This is very easy to do for anyone with a fair bit of CLR skills – and I think that I have proven that you can even replace those CLR skills with a judicious amount of Google (or Bing). Here is how my code (after several failed attempts) eventually looked:

 

// This is the initial size of the cache; it will grow when needed.

private const int InitialSize = 40;

public static TripleCache tc = new TripleCache(InitialSize);

 

[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read,

IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]

public static SqlInt32 CLRTripFromTbl3(SqlInt32 Value)

{

    SqlInt32 Result = tc[Value].Value;

    return Result;

}

 

public class TripleCache

{

    // Dictionary to contain the cache.

    static Dictionary<SqlInt32, WeakReference> _cache;

 

    public TripleCache(int count)

    {

        _cache = new Dictionary<SqlInt32, WeakReference>();

    }

 

    // Access a data object from the cache.

    public Triple this[SqlInt32 index]

    {

        get

        {

            Triple tr;

            // Try to read the triple from the cache.

            try

            {

                tr = _cache[index].Target as Triple;

            }

            catch

            {

                // Triple not yet in cache; read from table and add to cache

                tr = new Triple(index);

                _cache.Add(index, new WeakReference(tr));

            }

            if (tr == null)

            {

                // Triple has been in cache but was evicted - read from table again and renew cache

                tr = new Triple(index);

                _cache[index] = new WeakReference(tr);

            }

            return tr;

        }

    }

}

 

 

// This class reads the triple value from the table

public class Triple

{

    private SqlInt32 _triple;

 

    public Triple(SqlInt32 single)

    {

        using (SqlConnection conn = new SqlConnection("context connection=true"))

        {

            conn.Open();

 

            SqlCommand cmd = new SqlCommand("SELECT Triple FROM dbo.Triples WHERE Value = @Value;", conn);

            cmd.Parameters.Add(new SqlParameter("Value", single));

 

            _triple = (SqlInt32)((int)cmd.ExecuteScalar());

 

        }

    }

 

    // Simple property.

    public SqlInt32 Value

    {

        get

        {

            return _triple;

        }

    }

}

 

With the test data used so far, this method really performs very well. The test with one million rows finished in 3.5 seconds (again, both elapsed and CPU – we still do not get a parallel plan for this query). That is a lot faster than all other UDF versions tested so far – though still not even near the performance we can get by not using a UDF at all.

 

To be fair, the test data is *very* nice for this specific method. The lookup table used holds two hundred thousand rows, but the sample data is skewed to use only ten distinct values. So this last CLR version reads those ten values into its cache and then keeps serving them from cache over and over again. While using only a minimal amount of memory for its cache, it reduces the amount of actual I/O to only ten queries. In short, my test case was the ideal use case for the caching technique.

 

I wanted to know how the cached version holds up when the circumstances are less ideal, so I also executed this query:

 

SELECT MAX(dbo.CLRTripFromTbl3(KeyVal % 90000)) AS MaxTriple

FROM   dbo.LargeTable

WHERE  KeyVal <= 1000000;

 

By using “KeyVal % 90000” instead of “DataVal”, I now ensure that 90,000 different rows from the lookup table are accessed, each 11 or 12 times. This means that, even with the cache, a lot of database queries are still needed – though still than without the cache. The results were devastating! The above query ran for almost seven MINUTES (415 seconds, to be exact, using 405 seconds CPU time). At this point, I started to think that maybe the cache didn’t work properly with this amount of entries. The WeakReference class I used is supposed to allow allocations to be freed in order to free up some memory; maybe that was happening so aggressively that there cached data was evicted every time before it could be reused? To test this, I ran the same query another time; now the elapsed time was down to “only” 326 seconds, almost 100 seconds less. So it looks like the cache is still working, saving the need to fetch the same data multiple times – but the overhead for creating and using the cache costs more than the saved database calls; the simple CLR UDF that just calls the database every time is lots faster!

(To be fair, all methods see some performance loss when tested with the equivalent of the above query, just not as much. The T-SQL UDF takes 27 seconds, the CLR UDF takes 364 seconds, and the version with inlined logic instead of a UDF now takes almost 0.5 seconds – slower, but still comfortably running rings around all the competitors!)

 

Now obviously, the two tests I used are both extreme cases. In many cases where you might consider implementing a CLR UDF with data access, the amount of distinct rows read will be somewhere between ten and ninety thousand – so you really should run your own tests, on your own system and with your own typical data distribution.

 

There are also a few other caveats you should consider before deciding to implement this caching technique for your scalar CLR user-defined functions:

·         The cache persists across query executions. So if I execute the same query again, I will get an even better performance, because all the data is still available in the CLR cache. If you run a profiler trace while executing this query a second time, you will see no data being read at all. For performance, this can be a good thing – but beware! It also means that you will have to find a way to deal with stale data. For triples, this is not really an issue. But there are also many lookup tables where the data may change. Like, for instance, a table with exchange ratios for foreign currencies. How would your bank do if, even after entering the new exchange ratios for the failing European currencies, lookup queries still return the old values?

·         The assembly that implements the cache has to be loaded with PERMISSION_SET = UNSAFE. Your DBA might not like that. And for good reasons. If you consider implementing techniques such as this cache, or Adam’s parallelizer, be aware that these techniques trade safety for speed. You may crash your AppDomain, which may cause the loss of all data in the cache or even kill executing queries. You may have to implement code to detect and repair this – if performance is really important, and you really have to use a UDF with data access, you could consider accepting the risks and downsides. But I won’t. Even after discussing this with Adam, I still don’t feel that I really understand the risks – and if I don’t understand the risks, then there is no way I will allow the code to run on my machine (except in test environments).

 

Bottom line

 

Out of the box, CLR user-defined functions will not perform well when data access is involved. They can be up to ten times slower than their T-SQL counterparts, and those were already pretty bad.

 

Previously, we have seen that CLR user-defined functions can offer a good performance improvement when heavy computations are involved. So, what to do if your required functionality includes a combination of data retrieval and heavy computations? My recommendation would be to try to separate the two parts. Retrieve all the data in the T-SQL query, without using any user-defined data, then pass the required arguments to a CLR user-defined function without data access to do the computation. Or do the computation inline – even with heavy computations, inlining the logic sometimes is still faster!

 

One way to speed up a CLR user-defined function with data access is to implement a cache, to prevent unnecessary roundtrips to the server. This does require you to think very carefully about the risk of serving stale data, and you may also find it very hard to convince your DBA to allow you to load an unsafe assembly – but when implemented successfully, it can result in a tremendous boost in performance, especially if the typical use of the function involves repeatedly reading a small set of data. But beware – when the number of distinct rows that are used (and will hence be read in cache) increases, and the number of cache hits decreases, caching can easily become a bane rather than a boon, because of the overhead involved.

 

This concludes my discussion of scalar user-defined functions. In the next episodes of this series, we will look at table-valued user-defined functions. And we will find that those come in different flavors, with VERY different performance characteristics. Stay tuned!

Published Saturday, August 25, 2012 12:07 AM by Hugo Kornelis
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Hugo, I wish you would have written to me before publishing so I could help you get that loopback code working. You need to do the loopback call in a different thread -- which is why I built the threading library (doing that work properly, so that you don't cause problems for yourself down the road, is not trivial).

August 24, 2012 6:08 PM
 

AlexK said:

Hi Hugo,

Typically we get better performance with ContainsKey method instead of catching exceptions, as you did in your TripleCache class. What do you think?

August 25, 2012 10:03 PM
 

Hugo Kornelis said:

@Adam: Yup, I got that. And I did look at trying to implement the threading library. But after seeing the code and the supplied demos, I realised that for me (with my limited understanding of CLR), it would take a lot of time to get to understand it sufficiently before I could adapt it to work as part of a scalar UDF. And I did not want to postpone publishing this post even more, it was already delayed more than I wanted.

@Alex: As I said before, I am absolutely not an expert in CLR. I wrote the code by adapting a snippet I found (somewhere in the MSDN library) through Google. I was not aware that it's possible to use ContainsKey instead of Try Catch. But now that I do, I'll have to play around with it and see how it affects performance. Thanks for the suggestion!

August 26, 2012 4:50 AM
 

Adam Machanic said:

@Alex: I think that should only improve performance if you have a large number of "misses" (i.e. lookups for keys that don't exist). But shouldn't TryGetValue be best in every case?

August 26, 2012 8:56 AM
 

AlexK said:

@Adam: the cache is the slowest when it is just warming up. This is when it needs speeding up most, and this is exactly when we have most misses.

I agree that TryGetValue might be a very good choice. However, I would expect big difference between throwing exceptions and avoiding them, and somewhat smaller differences between different ways of avoiding exceptions.

I have not benchmarked it recently, so I could be wrong.

What do you think?

August 26, 2012 3:26 PM
 

Adam Machanic said:

@Alex: Depends on how you design the cache. A lazy loading cache might get a lot of misses at the beginning; a more ambitious pre-loading scheme may avoid that. If it's possible to do. It all depends...

August 26, 2012 5:17 PM
 

Danny Cordell said:

I am looking forward to the article on table-valued UDF's.

December 28, 2012 12:35 PM
 

JP said:

Are you still planning a fifth part to the series? This has been a great summary on the pros and cons of user defined functions.

January 9, 2013 10:52 AM
 

Hugo Kornelis said:

JP: I definitely want to continue the series. But I cannot commit myself to any time frame at this time. I am now working on something else that takes up almost all the time I have available for community work.

Once that is done, I hope to find enough time and energy to resume my blogging.

January 10, 2013 4:47 AM

Leave a Comment

(required) 
(required) 
Submit

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement