It's been quite a while since the LIKE vs ? Puzzle,
and I feel like it's time for another one. Response was overwhelming
last time, and I'm back with a much tougher puzzle and a much bigger
prize. So get ready, because I'm going to really make you stretch your
brain and your T-SQL skills for this one.
But first, a bit of background. String concatenation is something I've talked about on this blog before,
and it is an incredibly popular topic; my post on the subject has
gotten more hits than any other single post I've ever done. TechNet
blogger Ward Pond also understands the popularity of the topic, having discussed concatenation at least five times on his blog. And as illustrated in this excellent article by Anith Sen, there are a number of methods available to the intrepid SQL Server explorer; the techniques Ward and I show are just the tip of the iceberg.
while all of that is great, there is a deep and troublesome hidden
problem: Nowhere in my post, nor Anith's article, nor Ward's series,
will you find a technique that completely solves the concatenation
problem. The FOR XML PATH('') method--by far the most popular SQL
Server 2005 "trick" I see repeated over and over in these articles and
on forums--is a bit limiting. It doesn't help when we need to "group"
our string concatenation, i.e. concatenate strings for a number of key
values and return multiple rows in the result. And FOR XML PATH('')
also leaves us a bit flat if we need to return aggregated data with the
concatenated strings or--even more interesting--concatenate multiple
different columns in the same output.
Sure, there are ways to
solve this problem, but they usually require temp tables, user-defined
functions (CLR or otherwise), tables of numbers, cursors, or other
adjunct objects. And while some of these solutions are certainly
workable they lack the beauty of a single, self-contained solution. In
the interest of solving this problem I recently created a challenge for
myself: Figure out how to do "grouped" concatenation using nothing more
than a single T-SQL statement. No temp tables. No UDFs. No procedural
But rather than do the work all alone and simply post my
solution, I've decided to invite you to join me in the quest. Are you
up for it?
Here are the rules of the game:
- You are
to create a single T-SQL statement that concatenates values from the
AdventureWorks (note: not AdventureWorks2008) Sales.SalesOrderHeader,
Sales.SalesOrderDetail, Production.Product, and Person.Contact tables.
- The output should have the following columns, in the following order, and no other columns:
- CustomerID: The customer's CustomerID (this is the unique key in the output)
- FirstName: The customer's first name
- LastName: The customer's last name
- OrderCount: Number of orders placed by the customer
- TotalDollarAmount: Total dollar amount of all orders placed by the customer (based on the SalesOrderHeader.SubTotal column)
- TotalProductQuantity: Total number of items purchased by the customer in all orders (based on SalesOrderDetail.OrderQty)
Comma-delimited list containing the order numbers
(SalesOrderHeader.SalesOrderNumber) for each of the orders placed by
- The numbers within the list should be alphabetized. The list
should have neither leading nor trailing commas, and each element in
the list should be separated by a single comma with no spaces or other
white space beforeor after the comma
- ProductNames: Comma-delimited list containing the unique names of all products ordered by the customer in all orders
- The names within the list should be alphabetized. The list should have neither leading nor trailing commas, and each
element in the list should be separated by a single comma with no
spaces or other white space beforeor after the comma
- No tables--permanent, temporary, or variable--are to be
created. No dynamic SQL is to be used. No user-defined functions,
views, or stored procedures are allowed. No variables may be declared.
To put it simply, no permanent or temporary objects of any kind, at any
scope, are to be explicitly created. No procedural statements of any
kind--cursors or control-of-flow--are allowed. This must be a
standalone statement in the AdventureWorks database; nothing more and
- Aside from the previous stipulation, any SQL
Server 2005 or 2008 feature is fair game. Documented or not, if it
ships with the product and can be used in a standalone T-SQL statement,
you can use it. If you do use a version-specific feature, please let me
know (especially if it's a SQL Server 2005 feature that's gone in
2008). Bonus points may be given for solutions that work on either
version, but I'll make that decision after reviewing the submissions.
will be judged first and foremost on correctness, then on a combination
of performance, readability, and ability to apply your technique as a
- Just to be absolutely clear: If your submission violates the
rules, outputs the wrong data, or does not precisely follow the output
guidelines listed above, it will be ignored. Last time I spent a lot of
energy going back and forth with people helping them get there, and I
just don't have the bandwidth to do that again. So double-check your
submission before you send it to me.
- Make your submission readable or you will lose credit even if
performance is amazing. I don't appreciate looking at a mess, and it's
good for your career to learn how to write code that others can
maintain. Hint: Learn to indent your code properly; lack of indentation
is the biggest mistake I see people make with regard to readability.
- Take your time. You have two weeks to work on this. I've
already come up with three different solutions that have vastly
different performance characteristics. Perhaps your first shot isn't
the best choice?
- The entry deadline is March 16, 2009, midnight GMT. No exceptions.
- Submissions should be e-mailed to me, using a .SQL file attachment.
- Do not paste your solution into the body of your e-mail
- The subject of your e-mail should be "Grouped String Challenge Submission".
- E-mail your submissions to [my first name] [at] [this site].
- Again, be careful and don't violate these guidelines or your submission will be ignored.
- ... What's that? You want a prize? Fine, fine ...
- The prize, for the best submission, is a full MSDN subscription, valued at around $10,000. How's that for inspiration?
... and that's that! One final note: Please do not post your solution
in the comments here or on another blog, before the deadline has been
reached! Last time several people did that and it was incredibly
annoying both for me and those contestants trying to think through the
problem. You won't be doing yourself any favors by trying to mess up
Once the deadline is reached I will test all of
the submissions, tabulate the results, and post back here in early
April. I promise, I won't let the thing stagnate for months like I did
Have fun with it, be creative, and feel free to post comments here with any questions you might have. I found this to be a fairly difficult but very interesting exercise and I hope you agree. Enjoy, and I'm looking forward to seeing what you can do!