<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Adam Machanic : Architecture</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx</link><description>Tags: Architecture</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Stored procedures are not parameterized views</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/stored-procedures-are-not-parameterized-views.aspx</link><pubDate>Thu, 13 Jul 2006 01:54:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:109</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>6</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/109.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=109</wfw:commentRss><description>Peter van Ooijen over at CodeBetter.com &lt;a href="http://codebetter.com/blogs/peter.van.ooijen/archive/2006/05/29/145697.aspx"&gt;posted in his blog about some observations he had when working with stored procedures in a recent project&lt;/a&gt;.
What I found to be interesting about his post was his comment that a
stored procedure can be, "a view with parameters."&amp;nbsp; I've run into this
assertion before, and it's something I think needs some clarification
for a lot of developers. I do not feel that there is any real
similarity between stored procedures and views -- they are entirely
different types of objects in an SQL database, and should not be
considered forms of one another in any way. &lt;br&gt;&lt;br&gt;Following is an edited version of the response I left in Peter's blog; I thought it warranted its own post:&lt;br&gt;&lt;br&gt;Stored procedures are not -- and never can be -- "parameterized views". &amp;nbsp;A view 
in an SQL database can be treated the same as a table in virtually every 
context.&amp;nbsp; Consider:&lt;br&gt;&lt;br&gt;&lt;div style="margin-left:40px;font-family:Courier New;"&gt;SELECT * &lt;br&gt;FROM Tbl &lt;br&gt;&lt;/div&gt;&lt;br&gt;vs. &lt;br&gt;&lt;br&gt;&lt;div style="margin-left:40px;font-family:Courier New;"&gt;SELECT * &lt;br&gt;FROM 
View &lt;br&gt;&lt;/div&gt;&lt;br&gt;One
of the great things about working with views and tables is that the
person querying the database does not need to know whether the base
object being queried is a view or a table. For the sake of writing SQL
queries, they are one and the same. Both a view and a table have
well-defined columns, with well-defined datatypes.&lt;br&gt;&lt;br&gt;These
assertions cannot be made for a stored procedure as compared with a
view.&amp;nbsp; A stored procedure is related to a view only in as much as both
are defined using SQL syntax. But beyond there, the two diverge into
completely different types of entities. First of all, consider:&lt;br&gt;&lt;br&gt;&lt;div style="margin-left:40px;"&gt;&lt;span style="font-family:Courier New;"&gt;SELECT * 
&lt;/span&gt;&lt;br style="font-family:Courier New;"&gt;&lt;span style="font-family:Courier New;"&gt;FROM StoredProcedure &lt;/span&gt;&lt;br&gt;&lt;/div&gt;&lt;br&gt;This
will not work, and will only result in an "invalid object name"
exception. The reason? Stored procedures expose no explicit output
contract.&amp;nbsp; Thanks to conditional branching, dynamic SQL, and SELECT *,
a stored procedure can output vastly different results beween
invocations, or based on different input parameters. It is quite
possible to code a stored procedure that will output no result sets for
one set of input parameters, two result sets for another, and four for
another.&amp;nbsp; Or, it's possible to change the returned result sets, e.g. by
outputting different column names or datatypes. Please note, this is an
&lt;span style="font-style:italic;"&gt;extremely&lt;/span&gt; poor (and very
dangerous) coding habit to get into -- but the point is, it is
impossible to verify the output of a stored procedure for a given set
of input parameters without running it.&lt;br&gt;&lt;br&gt;Furthermore, a stored
procedure "late binds" to the base objects being queried.&amp;nbsp; This adds to
the difficulty in verifying the output of a stored procedure, and is
why you can create the following stored procedure without getting an
exception (until you try to run it, of course):&lt;br&gt;&lt;br&gt;&lt;div style="margin-left:40px;font-family:Courier New;"&gt;CREATE PROC XYZ &lt;br&gt;
AS &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT 
* &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM ThisTableDoesNotExist &lt;br&gt;
GO &lt;br&gt;&lt;/div&gt;
&lt;br&gt;These stored procedure behaviors are in stark 
contrast to the way views work.&amp;nbsp; Views provide a couple of means of verification: &lt;br&gt;&lt;ul&gt;&lt;li&gt;The output columns/data types can be verified, and bound to, before actually 
querying the view &lt;/li&gt;&lt;li&gt;A view can be "schema bound", meaning that the 
underlying base tables (or other views) which the view is based on cannot be 
changed, schema-wise, unless the view is dropped. &amp;nbsp; &lt;/li&gt;&lt;/ul&gt;For
the first point, simply query the INFORMATION_SCHEMA.COLUMNS or
sys.Columns views, and column information can be determined for a view
without having to query it.&lt;br&gt;&lt;br&gt;The second point adds to the first
in a few ways. Schema binding brings to views a certain sense of "early
binding," which as I mentioned is missing in stored procedures.
Although no view can be created if one of its base objects does not
exist, schema binding takes it one step further and guarantees that the
base objects used to create the view &lt;span style="font-style:italic;"&gt;must exist, and must not be changed&lt;/span&gt;,
for as long as the view is present in the database. This means that if
a schemabound view is created that outputs a certain set of columns
with certain datatypes, it is guaranteed to do so for as long as it
exists in the database -- in other words, its contract is bound to the
schema, and changes to other objects cannot affect it.&amp;nbsp; This is a
powerful guarantee, which stored procedures fail to make.&lt;br&gt;&lt;br&gt;So now
the question is, if a stored procedures isn't a parameterized view then
what is? The answer, as of SQL Server 2000 (and continuing in 2005), is
the table-valued UDF. &amp;nbsp;A table-valued UDF is parameterized, has an
explicit and verifyable output contract*, and can be schema bound. &amp;nbsp;If
you are looking to implement a solution that makes use of a form of
parameterized views, stored procedures are probably not the right
choice.&amp;nbsp; I think that table-valued UDFs are quite underused and deserve
a second (or first!) look from many T-SQL developers who may have
glossed over them in the past.&lt;br&gt;&lt;br&gt;*
Note: Unlike for a view, the column list for a table-valued UDF cannot
be queried from the INFORMATION_SCHEMA.COLUMNS table. The column list
is, however, available from sys.Columns.&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=109" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx">Architecture</category></item><item><title>Rowset string concatenation: Which method is best?</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/rowset-string-concatenation-which-method-is-best.aspx</link><pubDate>Thu, 13 Jul 2006 01:15:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:77</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>25</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/77.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=77</wfw:commentRss><description>&lt;p&gt;Yeah, yeah, yeah, let's get this out of the way right from the start:
Don't concatenate rows into delimited strings in SQL Server. Do it
client side.
&lt;/p&gt;&lt;p&gt;
Except if you really have to create delimited strings in SQL Server.  In which case you should read on.
&lt;/p&gt;&lt;p&gt;
There was a little &lt;a href="http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=42415" target="#"&gt;discussion&lt;/a&gt; on &lt;a href="http://www.sqlteam.com/" target="#"&gt;SQLTeam&lt;/a&gt;
about the best way to concatenate. I recommended a scalar UDF solution,
whereas Rob Volk recommended a solution involving a temp table.
&lt;/p&gt;&lt;p&gt;I mentioned my dislike for the temp table solution for a couple
of reasons. First of all, it relies on a clustered index for ordering.
That will probably work in this example, but is not guaranteed to
always work and relying on indexes rather than ORDER BY for ordering is
definitely not a habit I want anyone to get into. The clustered index
as it was described in Rob's example also has another problem that I
didn't even notice until I was writing this entry. But I'll get to that
in a moment. The second reason I dislike the temp table is that I felt
it would be less efficient than the scalar UDF.
&lt;/p&gt;&lt;p&gt;
Rob didn't agree about the efficiency.  And so I set out to prove him wrong...
&lt;/p&gt;&lt;p&gt;
We'll use the Authors table in Pubs.  I want a comma-delimited list, per state, of the last name of each author who lives there.
&lt;/p&gt;&lt;p&gt;
First, the scalar UDF:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;USE pubs&lt;br&gt;GO&lt;br&gt;&lt;br&gt;CREATE FUNCTION dbo.ConcatAuthors(@State CHAR(2))&lt;br&gt;RETURNS VARCHAR(8000)&lt;br&gt;AS&lt;br&gt;BEGIN&lt;br&gt;	DECLARE @Output VARCHAR(8000)&lt;br&gt;	SET @Output = ''&lt;br&gt;&lt;br&gt;	SELECT @Output =	CASE @Output &lt;br&gt;				WHEN '' THEN au_lname &lt;br&gt;				ELSE @Output + ', ' + au_lname &lt;br&gt;				END&lt;br&gt;	FROM Authors&lt;br&gt;	WHERE State = @State&lt;br&gt;	ORDER BY au_lname&lt;br&gt;&lt;br&gt;	RETURN @Output&lt;br&gt;END&lt;br&gt;GO&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;
To find the list I want:
&lt;/p&gt;&lt;pre class="code"&gt;SELECT DISTINCT State, dbo.ConcatAuthors(State)&lt;br&gt;FROM Authors&lt;br&gt;ORDER BY State&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;... And the adaptation of Rob's temp table method... I did change
two things due to problems I discovered during testing. One, I've
altered the au_lname column to VARCHAR(8000); the column in the Authors
table is VARCHAR(40), not large enough for all of the California
authors. What if we were dealing with a much larger dataset? Second, I
added an IDENTITY column, and I'm clustering on that instead of the
actual data to get the ordering. I'm doing so because of the
VARCHAR(8000). Index rows can be a maximum of 900 bytes, so if we had
enough data to exceed that length, this method would fail.
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;CREATE TABLE #AuthorConcat&lt;br&gt;(&lt;br&gt;	State CHAR(2) NOT NULL,&lt;br&gt;	au_lname VARCHAR(8000) NOT NULL,&lt;br&gt;	Ident INT IDENTITY(1,1) NOT NULL PRIMARY KEY&lt;br&gt;)&lt;br&gt;&lt;br&gt;INSERT #AuthorConcat &lt;br&gt;(&lt;br&gt;	State,&lt;br&gt;	au_lname&lt;br&gt;)&lt;br&gt;SELECT&lt;br&gt;	State, &lt;br&gt;	au_lname&lt;br&gt;FROM Authors&lt;br&gt;ORDER BY &lt;br&gt;	State, &lt;br&gt;	au_lname&lt;br&gt;&lt;br&gt;DECLARE @Authors VARCHAR(8000)&lt;br&gt;SET @Authors = ''&lt;br&gt;DECLARE @State CHAR(2)&lt;br&gt;SET @State = ''&lt;br&gt;&lt;br&gt;UPDATE #AuthorConcat&lt;br&gt;SET @Authors = au_lname =	CASE &lt;br&gt;				WHEN @State = State THEN @Authors + ', ' + au_lname &lt;br&gt;				ELSE au_lname END,&lt;br&gt;	@State = State&lt;br&gt;&lt;br&gt;SELECT State, MAX(au_lname) &lt;br&gt;FROM #AuthorConcat&lt;br&gt;GROUP BY State&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Clever, but more complex and harder to read than the scalar UDF
version. Output is identical, but that's not why we're here. Which one
is more efficient?
&lt;/p&gt;&lt;p&gt;
Drumroll, please...
&lt;/p&gt;&lt;p&gt;Results were tabulated using STATISTICS IO, STATISTICS TIME, and
Query Analyzer's Show Execution Plan. DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE were run before each test.
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Scalar UDF Method&lt;/b&gt;
&lt;br&gt;
Total cost: 0.0492
&lt;br&gt;
Total Scan count: 1
&lt;br&gt;
Total Logical reads: 2
&lt;br&gt;
Total Physical reads: 2
&lt;br&gt;
Total time: 25 ms
&lt;/p&gt;&lt;p&gt;
&lt;b&gt;Temp Table Method&lt;/b&gt;
&lt;br&gt;
Total cost: 0.2131
&lt;br&gt;
Total Scan count: 4
&lt;br&gt;
Total Logical reads: 9
&lt;br&gt;
Total Physical reads: 2
&lt;br&gt;
Total time: 88 ms
&lt;/p&gt;&lt;p&gt;So in conclusion, neither method is incredibly taxing with the
tiny Pubs dataset, but I think I have proven that the UDF is far more
efficient.
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;hr&gt;Update, February 28, 2005: Modified the adapation of Rob Volk's
method to use a CREATE TABLE instead of SELECT INTO, as the latter is
not necessarily guaranteed to insert rows in the right order for the
sake of this example. Thanks to "PW" on SQLServerCentral for pointing
this problem out. Note that this changed the total costs very slightly
-- for the better -- but the UDF still performs better by quite a large
margin.&lt;br&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=77" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Performance/default.aspx">Performance</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx">Architecture</category></item><item><title>No, stored procedures are NOT bad</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/no-stored-procedures-are-not-bad.aspx</link><pubDate>Thu, 13 Jul 2006 01:10:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:74</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>10</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/74.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=74</wfw:commentRss><description>I recently found a rather old post from &lt;a href="http://weblogs.asp.net/fbouma/" target="#"&gt;Frans Bouma&lt;/a&gt;'s blog, &lt;a href="http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx" target="#"&gt;"Stored procedures are bad, m'key?"&lt;/a&gt;.
&lt;p&gt;
Since comments are closed (he posted it almost a year ago), I have to reply here.
&lt;/p&gt;&lt;p&gt;
I'll state my bottom line at the top: Stored procedures are not only
not bad, they're necessary for maintaining loosely coupled, application
agnostic, flexible databases. And they're also necessary for
maintaining loosely coupled, database agnostic, flexible database-based
applications.
&lt;/p&gt;&lt;p&gt;Mr. Bouma's argument centers around three claims: Ad-hoc
queries are easier to construct in application code, stored procedures
don't provide greater security than the application already does, and
in certain cases stored procedure execution plan caching can be
detrimental to performance. And all three of these arguments are true!
But none of them has any bearing whatsoever on the real benefits of
stored procedures.
&lt;/p&gt;&lt;p&gt;
Let's have a quick refresher on Computer Science 101, as it applies to object-oriented design:  Loose &lt;a href="http://en.wikipedia.org/wiki/Coupling" target="#"&gt;coupling&lt;/a&gt; and high &lt;a href="http://en.wikipedia.org/wiki/Cohesion" target="#"&gt;cohesion&lt;/a&gt;.
What does that mean in plain English? Each module, or object, should
have one very specific job, which it can perform for a VARIETY of other
modules or objects on request (high cohesion). Furthermore, no module
or object (or as few as possible) should depend upon the inner workings
of another module or object. This latter problem is known as tight
coupling, and its consequences are dire. Change one piece of code, end
up changing every other piece of code that uses it.
&lt;/p&gt;&lt;p&gt;News flash for Mr. Bouma: This is the reason object-oriented
programming was invented. We try to separate components from one
another so that their functions can be re-used and re-applied to other
modules, other objects, and if we've done our job really well, even
other applications entirely.
&lt;/p&gt;&lt;p&gt;
So how does this apply to the relationship between a database and an application?
&lt;/p&gt;&lt;p&gt;The database, undoubtedly, should be completely application
unaware. The database is serving up data. It has no idea what
application is requesting the data, or whether an application is
requesting the data. It needs to have no such idea. All it needs to do
is keep serving and all is happy. Furthermore, the database doesn't
care if multiple applications request data, or if those applications
request the same data. The database is set up to provide a single point
of access for all data requests in its domain of knowledge (i.e. the
data in the database being queried).
&lt;/p&gt;&lt;p&gt;And now to the other side, the application itself. I am an
advocate of the application being as database agnostic as possible. I
don't believe this is entirely possible, but it is certainly a goal to
which we can aspire. Applications should request data from the database
using standardized, documented interfaces, after which that data should
be composed as quickly as possible (at the lowest level) into native
objects. This allows for changes in the data interface to have as
little ripple effect as possible in the application.
&lt;/p&gt;&lt;p&gt;Mr. Bouma makes the claim that, "changes to a relational model
will have always an impact on the application that targets that model".
Again, he is correct. Which is why applications should NOT target a
model. Applications should have &lt;i&gt;absolutely no knowledge&lt;/i&gt;
of database schema, including table names, column names, data types, or
any other information. This is the role of stored procedures. We can,
using stored procedures, &lt;i&gt;completely encapsulate&lt;/i&gt; this metadata and provide standardized interfaces into the data.
&lt;/p&gt;&lt;p&gt;
Need to change a datatype in the database? Perhaps you won't need to
change the output values that the application receives from the stored
procedures. Perhaps you won't need to change the parameter input
values. No change is needed in application code. Need to change a
column name? Same thing. Need to re-architect the &lt;i&gt;entire&lt;/i&gt; schema?  Again, just change your stored procedures.  The application will keep running as if nothing changed.
&lt;/p&gt;&lt;p&gt;
Contrast this to Mr. Bouma's suggestion that we drop stored procedures
altogether and instead build all queries within application code.
Suddenly, &lt;i&gt;any&lt;/i&gt; small change in the database needs to be completely regression tested throughout not only one app, but &lt;i&gt;every application&lt;/i&gt;
that uses the database. You've created an extremely tight, perhaps
unbreakable coupling between the application and the database.
Large-scale changes to the schema will most likely never be possible.
Will there ever really be time to re-write all of that application
code?
&lt;/p&gt;&lt;p&gt;Now, back to the three central arguments: Yes, ad-hoc queries
are easier to construct in application code; but it's also argued by
many database &lt;a href="http://www.celko.com/" target="#"&gt;experts&lt;/a&gt;
that ad-hoc queries are a sign of either poor database design, poor
application design, or both. Either way, dynamic SQL isn't too hard to
work with, and I've seen plenty of extremely ad-hoc applications in
which its use is hardly a stumbling block. Next, security. It's true
that if a user isn't authenticated in the application, he or she won't
be able to use &lt;i&gt;that application&lt;/i&gt; to access the data. So in that
case, the stored procedure does not provide greater access control. But
the same is certainly not true for every other application that uses
the same database. Unless, of course, code is duplicated across every
application. The database should be the final word on its data. This
includes data security and data integrity. Application code simply
cannot do this if the database need ever be shared. Finally, Bouma's
assertions about cache plans are simply not worth touching. The WITH
RECOMPILE option has been around long enough that DBAs and developers
know it's there and know how to use it.
&lt;/p&gt;&lt;p&gt;Now that we're at the bottom of this post, I'll re-state my
conclusion: Stored procedures are necessary for proper object-oriented
database-based application development. Not using them will force code
duplication, tight coupling between applications and databases, and
greatly increased man hours for every change made to either application
or database.&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=74" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx">Architecture</category></item><item><title>Caveats of the TEXT datatype</title><link>http://www2.sqlblog.com/blogs/adam_machanic/archive/2006/07/12/caveats-of-the-text-datatype.aspx</link><pubDate>Thu, 13 Jul 2006 01:02:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:72</guid><dc:creator>Adam Machanic</dc:creator><slash:comments>0</slash:comments><comments>http://www2.sqlblog.com/blogs/adam_machanic/comments/72.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/adam_machanic/commentrss.aspx?PostID=72</wfw:commentRss><description>&lt;p&gt;Someone named "Krygim" posted the following question today in the microsoft.public.sqlserver.programming newsgroup:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div class="quote"&gt;
"Will defaulting a TEXT field to an empty string take up more space
than defaulting it to Null when no data is entered into the field.
[sic]"
&lt;/div&gt;
&lt;p&gt;
Before reading any further, I ask that you think for a moment and consider what you think the answer should be...
&lt;/p&gt;&lt;p&gt;If you answered "yes", you're correct. An empty TEXT column will
take up more space than a NULL one -- quite a bit more space, as it
turns out. Here's the test code I posted to prove it:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;USE tempdb&lt;br&gt;&lt;br&gt;CREATE TABLE atable(acol TEXT NULL)&lt;br&gt;&lt;br&gt;INSERT atable(acol)&lt;br&gt;SELECT NULL&lt;br&gt;FROM 	pubs..authors a,&lt;br&gt;	pubs..authors b,&lt;br&gt;	pubs..authors c,&lt;br&gt;	pubs..authors d&lt;br&gt;&lt;br&gt;EXEC sp_spaceused 'atable'&lt;br&gt;&lt;br&gt;CREATE TABLE btable(acol TEXT NOT NULL)&lt;br&gt;&lt;br&gt;INSERT btable(acol)&lt;br&gt;SELECT ''&lt;br&gt;FROM	pubs..authors a,&lt;br&gt;	pubs..authors b,&lt;br&gt;	pubs..authors c,&lt;br&gt;	pubs..authors d&lt;br&gt;&lt;br&gt;EXEC sp_spaceused 'btable'&lt;br&gt;&lt;br&gt;DROP TABLE atable&lt;br&gt;DROP TABLE btable&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;Running this on my end tells me that the NULLs took up only 3 MB,
whereas the empty strings took up 33 MB -- 11 times more space used!
&lt;/p&gt;&lt;p&gt;
I did a quick scan of &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0735609985/qid=1098462012/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/102-1036834-3046505?v=glance&amp;amp;s=books&amp;amp;n=507846" target="#"&gt;Inside Microsoft SQL Server 2000&lt;/a&gt; and discovered that for TEXT and IMAGE datatypes:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div class="quote"&gt;
"If the amount of data is less than 32 KB, the text pointer in the data row points to an 84-byte text root structure."
&lt;/div&gt;
&lt;p&gt;This, in addition to the 16-byte pointer to the off-row location of
the data, explains the huge difference in size between these two
tables.
&lt;/p&gt;&lt;p&gt;At this point, you're probably either wondering how to fix this
or you've drifted away and are half-reading, half-thinking about what's
for dinner. If you're in the latter category and not also in the former
category, you either know the answer already or are not enough of a
certified geek to be reading my blog. And if you're in the former
category and not in the latter category, your priorities are certainly
skewed, as we all know that food is more important than any stupid DBMS
space wasted due to empty strings problem.
&lt;/p&gt;&lt;p&gt;But for those still following me, the answer is one of my
favorite about-to-be-deprecated features of SQL Server, the mighty
text-in-row option... Re-run the second part of the previous example,
with a new line inserted:
&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;pre class="code"&gt;USE tempdb&lt;br&gt;&lt;br&gt;CREATE TABLE btable(acol TEXT NOT NULL)&lt;br&gt;&lt;br&gt;EXEC sp_tableoption 'btable', 'text in row', '24'&lt;br&gt;-- HINT: This is the new line&lt;br&gt;&lt;br&gt;INSERT btable(acol)&lt;br&gt;SELECT ''&lt;br&gt;FROM	pubs..authors a,&lt;br&gt;	pubs..authors b,&lt;br&gt;	pubs..authors c,&lt;br&gt;	pubs..authors d&lt;br&gt;&lt;br&gt;EXEC sp_spaceused 'btable'&lt;br&gt;&lt;br&gt;DROP TABLE btable&lt;br&gt;&lt;/pre&gt;
&lt;p&gt;There it is. Back down to 3 MB, even with the non-empty strings in
the TEXT column. By storing the small data in-row, we've eliminated the
16-byte off-row pointer and the 84-byte root structure, and other
assorted bytes that are used by the TEXT datatype that I don't know
about (the math didn't quite add up when I tried to calculate where all
of the space went).
&lt;/p&gt;&lt;p&gt;So what does this tell us? I'm thinking that as a best
practice, perhaps the text-in-row option should be used for every table
with LOB columns, and that it should be set to around 100 bytes. This
will still keep row sizes down when larger data is inserted (as it will
go off-row), but it will also keep overall IO way down if the amount of
rows with LOB data larger than 100 bytes is fairly sparse.
&lt;/p&gt;&lt;p&gt;
Comments?&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=72" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/T-SQL/default.aspx">T-SQL</category><category domain="http://www2.sqlblog.com/blogs/adam_machanic/archive/tags/Architecture/default.aspx">Architecture</category></item></channel></rss>