THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Denis Gobo

Some Simple Code To Show The Difference Between Newid And Newsequentialid

Some Simple Code To Show The Difference Between Newid And Newsequentialid

In SQL Server 2000 we got the uniqueidentifier data type and the newid() function

Lots of people thought that newid() would be very handy to create some unique values across all databases.
Newid() is nice but it has a little side effect; it causes terrible page splits because it is a random value. SQL Server 2005 introduced newsequentialid() where each value generated by the function is always greater than the previous value.

Let's take a look


First we will create these two tables


CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER not null DEFAULT newid(),
SomeDate DATETIME, batchNumber BIGINT)
 
CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER not null DEFAULT newsequentialid(),
SomeDate DATETIME, batchNumber BIGINT)
 
 
now run this block of code to insert 1000 rows in each table

SET NOCOUNT ON
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),1)
go 1000
 
SET NOCOUNT ON
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),1)
go 1000



Create these two clustered indexes
CREATE CLUSTERED INDEX ix_id1 ON TestGuid1(id)
CREATE CLUSTERED INDEX ix_id2 ON TestGuid2(id)
 
Run the code below
DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2')  WITH tableresults


You will see that AvgerageFreeBytes is the same for both tables

What about the inserts themselves? Run this code below

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid1
GROUP BY batchNumber
 
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid2
GROUP BY batchNumber


The table with the Newsequentialid did the inserts about 40% faster.

Now we will insert 1000 rows in each table again

SET NOCOUNT ON
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),2)
go 1000
 
SET NOCOUNT ON
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),2)
go 1000


No we will look again what happened

DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2')  WITH tableresults


Wow the first table uses 21 pages while the second one uses 12.
AvgerageFreeBytes is 3524 bytes per page in the first table and only 96 bytes per page in the second table. Obviously newsequentialid is the better choice.

Running the code below you will see that newid is still slower than newsequentialid by about 16% or so for batch 2
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid1
GROUP BY batchNumber
 
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid2
GROUP BY batchNumber



Now we will insert 10000 rows and then look at freespace and duration again

SET NOCOUNT ON
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),3)
go 10000
 
SET NOCOUNT ON
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),3)
go 10000


DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2')  WITH tableresults


The first table uses 117 pages while the second one uses 80.
AvgerageFreeBytes is 2574 bytes per page in the first table and only 21 bytes per page in the second table.

Running the code below you will see that newid is still slower than newsequentialid by about 40% or so for batch 3
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid1
GROUP BY batchNumber
 
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid2
GROUP BY batchNumber



Clean up
DROP TABLE TestGuid2,TestGuid1

So that is all, do you use newid or newsequentialid and if you do use newid did you experience fragmentation or performance problems because of it?
Published Thursday, February 05, 2009 1:52 PM by Denis Gobo

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Rhys said:

Wow, this is something I doubt I would ever had considered. One possible future performance problem hopefully avoided!

February 5, 2009 1:47 PM
 

chase said:

Maybe I'm missing something, but whats the difference between an identity column, and a column that defaults to newsequentialid(), except for the fact that you can easily insert your own value if you wish?

February 5, 2009 1:48 PM
 

Denis Gobo said:

Chase,

an identity column is an integer data type and is not guaranteed to be unique

If I create a uniqueidentifier  in my DB and you create a uniqueidentifier in your DB then these two values should never collide

February 5, 2009 1:53 PM
 

Adam Machanic said:

Denis:

"If I create a uniqueidentifier  in my DB and you create a uniqueidentifier in your DB then these two values should never collide"

Are you sure?  There is no guarantee that they will never collide, even in the same table.  It may be a very, very small possibility, but it's still a possibility.  The things are more or less random, after all.

February 5, 2009 1:59 PM
 

Adam Machanic said:

chase:

I don't often see a good reason to use GUIDs instead of integers as surrogate keys but when I used to work with some C++ apps some of the developers liked GUIDs because they can be related back to object IDs.  Another good reason would be certain forms of replication, e.g. Merge.

February 5, 2009 2:00 PM
 

Denis Gobo said:

Adam, that is why I used should  :-)

February 5, 2009 2:03 PM
 

Denis Gobo said:

Adam one reason to use GUIDs is when you pass stuff around in URLS

when you have UserID = 23 it is probably likely that 22 or 24 also exist. When you see something like 770EA536-0B65-4813-B758-F25543C50DF5 how could you possibly 'guess' another value

February 5, 2009 2:05 PM
 

Adam Machanic said:

Denis: In that case, depending on how active the site is, I might use an INT internally and a GUID externally, to make sure that things perform as well as possible in the database (smaller keys, etc).

February 5, 2009 2:16 PM
 

Denis Gobo said:

Adam,

That is what I did also

one GUID + int table and the int used as PK for all the other tables

February 5, 2009 2:19 PM
 

Alexander Kuznetsov said:

Denis,

It is a very interesting read, but it would be even more interesting to insert simultaneously from multiple connections for like 30 seconds. That IMO would be closer to real life.

February 5, 2009 2:42 PM
 

Mark Brinton said:

"So that is all, do you use newid or newsequentialid and if you do use newid did you experience fragmentation or performance problems because of it?"

I'll say.  I learned a cautionary lesson about using COTs (commercial off-the-shelf) desktop databases in mission-critical healthcare apps.  As preface; if memory serves, uniqueidentifier/newid() was introduced in SQL Server 7.  Initially, it was generated using a call to a now-deprecated Win32 API which generated a monotonically increasing (sequential) guid based on the MAC address and time-stamp.  The underlying platform api call changed in Windows XP and, as a result, entirely new behavior resulted.  Imagine my rather nasty surprise, having characterized and tuned performance on Windows NT/SQL 7, to see this uber-fragmentation in our installed base of medical devices that where field upgraded to Windows XP/SQL 2000.  I had been off the project and did not authorize the field upgrade.  I was hastily yanked back and asked to find out what was going on.

February 5, 2009 3:17 PM
 

Armando Prato said:

My opinion is to avoid GUIDs - either NEWID or NEWSEQUENTIALID - as a primary key unless you have a specific business case for it (ie Replication, distributed databases, etc).

Using int leads to less database bloat

JOINs are more efficient  

There's no built in function (i.e. SCOPE_IDENTITY) to get the last generated GUID; you have write special code to get it

Functions such as MIN and MAX (and others) won't work against a GUID column

February 5, 2009 4:04 PM
 

Stephen Russell said:

GUIDs rock!  I come from the replication, distributed environment.

Func() against a key value ??  Can you go write on the board for all to see the definition of a key please.  

From a development standpoint when the app gens it's own guid and doesn't rely on the db to make one and give it back it frees up connection times.  

February 9, 2009 9:27 AM
 

John Richter said:

As I understand the newsequentialid, it uses a MAC address on the system to get a shot at global uniqueness--looks like a return to the old method.  (FWIW, last I knew, MAC's were not guaranteed unique, just supposed to be.  In any case, dup's were not very likely.)  I've used something like this to roll my own--just put the "unique" system ID in one field and an identity in another & put them together for a "world" unique key.  

February 9, 2009 10:14 AM
 

shakar said:

no

June 24, 2009 1:38 AM
 

Jonathan Dickinson said:

"The things are more or less random, after all."

Not so, a GUID consists more or less of the following:

- Your current time (nanoseconds).

- Your current time zone.

- Your MAC address (Microsoft implementation).

- Random number.

The same machine will never produce an identical GUID; ever. If Ethernet on your farm isn't messing around (MAC addresses are not colliding) and you have a Microsoft network: the machines on the network will also never produce an identical GUID.

I am not sure how the sequential IDs work; it might just be tweaking the bit position of the Time/TZ and the behavior of the PRNG; which would make these potentially collide with normal GUIDs.

June 9, 2010 2:35 AM

Leave a Comment

(required) 
(required) 
Submit

About Denis Gobo

I was born in Croatia in 1970, when I was one I moved to Amsterdam (and yes Ajax is THE team in Holland) and finally in 1993 I came to the US. I have lived in New York City for a bunch of years and currently live in Princeton, New Jersey with my wife and 3 kids. I work for Dow Jones as a Database architect in the indexes department, one drawback: since our data goes back all the way to May 1896 I cannot use smalldates ;-( I have been working with SQL server since version 6.5 and compared to all the other bloggers here I am a n00b. Some of you might know me from http://sqlservercode.blogspot.com/ or even from some of the newsgroups where I go by the name Denis the SQL Menace If you are a Tek-Tips user then you might know me by the name SQLDenis, I am one of the guys answering SQL Questions in the SQL Programming forum.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement