<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Merrill Aldrich : cross join</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/tags/cross+join/default.aspx</link><description>Tags: cross join</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Random Join, STAT</title><link>http://www2.sqlblog.com/blogs/merrill_aldrich/archive/2009/07/20/random-people-stat.aspx</link><pubDate>Mon, 20 Jul 2009 16:27:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:15369</guid><dc:creator>merrillaldrich</dc:creator><slash:comments>1</slash:comments><comments>http://www2.sqlblog.com/blogs/merrill_aldrich/comments/15369.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/merrill_aldrich/commentrss.aspx?PostID=15369</wfw:commentRss><description>&lt;P&gt;I've been tinkering with a little puzzle, mainly for fun: suppose you need to generate a large number of "mock" names either for generating test data or for masking data that's restored from a production system into a test environment. For this task I have a set of tables from the &lt;A href="http://www.census.gov/genealogy/names/"&gt;US Census web site&lt;/A&gt;, that contain, each separately, the most common first and last names of men and women from the 1990 census. So here's the puzzle: how can I generate 1,000,000 random first and last name combinations from those tables, in a reasonable time.&lt;/P&gt;
&lt;P&gt;I loaded the lists into SQL Server into simple tables:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;CREATE TABLE &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[dbo].[femalefirstnames]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[name] [nvarchar]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;14&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) NOT NULL,
 &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;CONSTRAINT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[pk_femalefirstnames] &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;PRIMARY KEY CLUSTERED &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[name] &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ASC
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;)
);
&lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;GO

&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;CREATE TABLE &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[dbo].[lastnames]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[name] [nvarchar]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;14&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) NOT NULL,
 &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;CONSTRAINT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[pk_lastnames] &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;PRIMARY KEY CLUSTERED &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[name] &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ASC
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;)
);
&lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;GO

&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;CREATE TABLE &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[dbo].[malefirstnames]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[name] [nvarchar]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;14&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) NOT NULL,
 &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;CONSTRAINT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[pk_malefirstnames] &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;PRIMARY KEY CLUSTERED &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[name] &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ASC
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;)
);
&lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;GO
&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;A naive approach -- always a place to start -- would be simply to cross-join the last names against the first names, and pick the top 'x' results. This does work, and luckily the TOP clause will optimize so that SQL Server doesn't materialize the entire cross-join set:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT TOP &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;1000000 
   [last].name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;,
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[first].name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.lastnames &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[last]
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;CROSS &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;JOIN &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.femalefirstnames 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;UNION &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;ALL 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.malefirstnames&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[first]&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;;
&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The issue is that this is unlikely to produce a useful distribution of names. It tends to repeat the names on one side of the join for each name on the other, so, for example, you are very likely to get every single last name (about 100,000) paired with just one first name, then every last name paired with the second first name, and so on. It also is likely to produce the same set every time. In my test, in 1,000,000 names I had only 11 unique first names:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:magenta;"&gt;COUNT&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;DISTINCT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) 
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT TOP &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;1000000 
   [last].name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;,
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[first].name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.lastnames &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[last]
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;CROSS &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;JOIN &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.femalefirstnames 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;UNION &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;ALL 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.malefirstnames&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[first]
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;selectednames&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;;
&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Yields just 11.&lt;/P&gt;
&lt;P&gt;So, we need not just to cross join the lists but also to get SQL Server to randomize the pairings of names, even if it's less efficient for the optimizer. A second naive approach: cross the lists, but pick random pairs from the whole result set, rather than the first rows returned. This also works, but it takes a LOT of resources to produce the results, and it's very slow:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT TOP &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;1000000 
   [last].name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;,
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[first].name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.lastnames &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[last]
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;CROSS &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;JOIN &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.femalefirstnames 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;UNION &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;ALL 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.malefirstnames&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;[first]
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ORDER BY &lt;/SPAN&gt;&lt;SPAN style="COLOR:magenta;"&gt;NEWID&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(); &lt;/SPAN&gt;&lt;SPAN style="COLOR:green;"&gt;--&amp;lt; this is a bad idea; the server will have 
               -- to materialize and sort the whole cross-join set
&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This is slow and demanding because where in the previous example the database engine can produce just 1,000,000 rows in memory, return them and stop working, as soon as I add the Order By clause, the entire cross join has to be materialized and sorted in order to deliver the top 1,000,000 ordered rows. The cross is about 500,000,000 rows. I let it run for five or ten minutes and then cancelled it, knowing there had to be a better way :-).&lt;/P&gt;
&lt;P&gt;So, the ideal scenario would be to get a more "random" set of joined pairs, but without materializing too many more than I really need. If sorting the entire cross join in random order is too expensive, sorting the separate base tables isn't so bad. If there were a way to create a join where one random row from the last names table links to just a few random first names, less data would be disarded than with the massive cross.&lt;/P&gt;
&lt;P&gt;This is where the row_number() function comes to the rescue: if we create a randomly ordered set of last names and number them on the fly, and do the same with the first names, then it's pretty simple to join them using the row_number() sequences.&lt;/P&gt;
&lt;P&gt;Here's a query that produces a randomly-numbered set of last names. The output is in a different order each time you run the query, but always numbered:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;, 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;row_number&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;OVER &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ORDER BY &lt;/SPAN&gt;&lt;SPAN style="COLOR:magenta;"&gt;NEWID&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() ) &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;randid
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.lastnames&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;;
&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Likewise for the set of first names:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;, 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;row_number&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;OVER &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ORDER BY &lt;/SPAN&gt;&lt;SPAN style="COLOR:magenta;"&gt;NEWID&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() ) &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;randid
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.femalefirstnames
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;UNION &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;ALL
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.malefirstnames
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;allfirstnames&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;;
&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;To create a large number of combinations, I can just join the first set to the second, using a between clause to pair each last name with a subset of the first names:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT TOP &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;1000000 lastname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;, &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname 
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT 
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;, 
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;row_number&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;OVER &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ORDER BY &lt;/SPAN&gt;&lt;SPAN style="COLOR:magenta;"&gt;NEWID&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() ) &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;randid
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.lastnames 
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS  &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastnames
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;INNER JOIN &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT 
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstname&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;, 
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;row_number&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;OVER &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;( &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ORDER BY &lt;/SPAN&gt;&lt;SPAN style="COLOR:magenta;"&gt;NEWID&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;() ) &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;randid 
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;(
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.femalefirstnames
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;UNION &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;ALL
       &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;SELECT &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;name &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;FROM &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;dbo.malefirstnames
   &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;allfirstnames
&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;) &lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;AS &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstnames 
&lt;/SPAN&gt;&lt;SPAN style="COLOR:blue;"&gt;ON &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;lastnames.randid &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;BETWEEN &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstnames.randid &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;AND &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;firstnames.randid &lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;+ &lt;/SPAN&gt;&lt;SPAN style="COLOR:black;"&gt;999&lt;/SPAN&gt;&lt;SPAN style="COLOR:gray;"&gt;;
&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This makes a set of random pairs in about 10 seconds on my laptop. Not too bad!&lt;BR&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=15369" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/merrill_aldrich/archive/tags/cross+join/default.aspx">cross join</category><category domain="http://www2.sqlblog.com/blogs/merrill_aldrich/archive/tags/random+ordering/default.aspx">random ordering</category></item></channel></rss>