<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Adam Machanic : Performance, Architecture</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/Architecture/default.aspx</link><description>Tags: Performance, Architecture</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Rowset string concatenation: Which method is best?</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/rowset-string-concatenation-which-method-is-best.aspx</link><pubDate>Thu, 13 Jul 2006 01:15:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:77</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>25</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/77.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=77</wfw:commentRss><description>&lt;p&gt;Yeah, yeah, yeah, let's get this out of the way right from the start:
Don't concatenate rows into delimited strings in SQL Server. Do it
client side.
&lt;/p&gt;&lt;p&gt;
Except if you really have to create delimited strings in SQL Server.  In which case you should read on.
&lt;/p&gt;&lt;p&gt;
There was a little &lt;a href="http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=42415" target="#"&gt;discussion&lt;/a&gt; on &lt;a href="http://www.sqlteam.com/" target="#"&gt;SQLTeam&lt;/a&gt;
about the best way to concatenate. I recommended a scalar UDF solution,
whereas Rob Volk recommended a solution involving a temp table.
&lt;/p&gt;&lt;p&gt;I mentioned my dislike for the temp table solution for a couple
of reasons. First of all, it relies on a clustered index for ordering.
That will probably work in this example, but is not guaranteed to
always work and relying on indexes rather than ORDER BY for ordering is
definitely not a habit I want anyone to get into. The clustered index
as it was described in Rob's example also has another problem that I
didn't even notice until I was writing this entry. But I'll get to that
in a moment. The second reason I dislike the temp table is that I felt
it would be less efficient than the scalar UDF.
&lt;/p&gt;&lt;p&gt;
Rob didn't agree about the efficiency.  And so I set out to prove him wrong...
&lt;/p&gt;&lt;p&gt;
We'll use the Authors table in Pubs.  I want a comma-delimited list, per state, of the last name of each author who lives there.
&lt;/p&gt;&lt;p&gt;
First, the scalar UDF:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;USE pubs&lt;br&gt;GO&lt;br&gt;&lt;br&gt;CREATE FUNCTION dbo.ConcatAuthors(@State CHAR(2))&lt;br&gt;RETURNS VARCHAR(8000)&lt;br&gt;AS&lt;br&gt;BEGIN&lt;br&gt;	DECLARE @Output VARCHAR(8000)&lt;br&gt;	SET @Output = ''&lt;br&gt;&lt;br&gt;	SELECT @Output =	CASE @Output &lt;br&gt;				WHEN '' THEN au_lname &lt;br&gt;				ELSE @Output + ', ' + au_lname &lt;br&gt;				END&lt;br&gt;	FROM Authors&lt;br&gt;	WHERE State = @State&lt;br&gt;	ORDER BY au_lname&lt;br&gt;&lt;br&gt;	RETURN @Output&lt;br&gt;END&lt;br&gt;GO&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
To find the list I want:
&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT State, dbo.ConcatAuthors(State)&lt;br&gt;FROM Authors&lt;br&gt;ORDER BY State&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;... And the adaptation of Rob's temp table method... I did change
two things due to problems I discovered during testing. One, I've
altered the au_lname column to VARCHAR(8000); the column in the Authors
table is VARCHAR(40), not large enough for all of the California
authors. What if we were dealing with a much larger dataset? Second, I
added an IDENTITY column, and I'm clustering on that instead of the
actual data to get the ordering. I'm doing so because of the
VARCHAR(8000). Index rows can be a maximum of 900 bytes, so if we had
enough data to exceed that length, this method would fail.
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;CREATE TABLE #AuthorConcat&lt;br&gt;(&lt;br&gt;	State CHAR(2) NOT NULL,&lt;br&gt;	au_lname VARCHAR(8000) NOT NULL,&lt;br&gt;	Ident INT IDENTITY(1,1) NOT NULL PRIMARY KEY&lt;br&gt;)&lt;br&gt;&lt;br&gt;INSERT #AuthorConcat &lt;br&gt;(&lt;br&gt;	State,&lt;br&gt;	au_lname&lt;br&gt;)&lt;br&gt;SELECT&lt;br&gt;	State, &lt;br&gt;	au_lname&lt;br&gt;FROM Authors&lt;br&gt;ORDER BY &lt;br&gt;	State, &lt;br&gt;	au_lname&lt;br&gt;&lt;br&gt;DECLARE @Authors VARCHAR(8000)&lt;br&gt;SET @Authors = ''&lt;br&gt;DECLARE @State CHAR(2)&lt;br&gt;SET @State = ''&lt;br&gt;&lt;br&gt;UPDATE #AuthorConcat&lt;br&gt;SET @Authors = au_lname =	CASE &lt;br&gt;				WHEN @State = State THEN @Authors + ', ' + au_lname &lt;br&gt;				ELSE au_lname END,&lt;br&gt;	@State = State&lt;br&gt;&lt;br&gt;SELECT State, MAX(au_lname) &lt;br&gt;FROM #AuthorConcat&lt;br&gt;GROUP BY State&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Clever, but more complex and harder to read than the scalar UDF
version. Output is identical, but that's not why we're here. Which one
is more efficient?
&lt;/p&gt;&lt;p&gt;
Drumroll, please...
&lt;/p&gt;&lt;p&gt;Results were tabulated using STATISTICS IO, STATISTICS TIME, and
Query Analyzer's Show Execution Plan. DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE were run before each test.
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Scalar UDF Method&lt;/b&gt;
&lt;br&gt;
Total cost: 0.0492
&lt;br&gt;
Total Scan count: 1
&lt;br&gt;
Total Logical reads: 2
&lt;br&gt;
Total Physical reads: 2
&lt;br&gt;
Total time: 25 ms
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Temp Table Method&lt;/b&gt;
&lt;br&gt;
Total cost: 0.2131
&lt;br&gt;
Total Scan count: 4
&lt;br&gt;
Total Logical reads: 9
&lt;br&gt;
Total Physical reads: 2
&lt;br&gt;
Total time: 88 ms
&lt;/p&gt;&lt;p&gt;So in conclusion, neither method is incredibly taxing with the
tiny Pubs dataset, but I think I have proven that the UDF is far more
efficient.
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;hr&gt;Update, February 28, 2005: Modified the adapation of Rob Volk's
method to use a CREATE TABLE instead of SELECT INTO, as the latter is
not necessarily guaranteed to insert rows in the right order for the
sake of this example. Thanks to "PW" on SQLServerCentral for pointing
this problem out. Note that this changed the total costs very slightly
-- for the better -- but the UDF still performs better by quite a large
margin.&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=77" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx">Architecture</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category></item></channel></rss>