<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www2.sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server' : Data modeling</title><link>http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Data+modeling/default.aspx</link><description>Tags: Data modeling</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Busy months ahead</title><link>http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2012/02/16/busy-months-ahead.aspx</link><pubDate>Thu, 16 Feb 2012 19:48:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41778</guid><dc:creator>Hugo Kornelis</dc:creator><slash:comments>2</slash:comments><comments>http://www2.sqlblog.com/blogs/hugo_kornelis/comments/41778.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/hugo_kornelis/commentrss.aspx?PostID=41778</wfw:commentRss><description>Almost two months have passed since my last blog post. And while it’s true that I’ve had (much) longer breaks, I do have a good reason now. All the time that I would normally at least in part spend on preparing new blog posts is now reserved for preparing...(&lt;a href="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2012/02/16/busy-months-ahead.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=41778" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Conference/default.aspx">Conference</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Data+modeling/default.aspx">Data modeling</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Database+design/default.aspx">Database design</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Indexing/default.aspx">Indexing</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/SQL+Server+2008/default.aspx">SQL Server 2008</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/SQL+Server+2012/default.aspx">SQL Server 2012</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/T-SQL/default.aspx">T-SQL</category></item><item><title>Principles of Modeling: the Reproducibility Principle</title><link>http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2011/12/22/principles-of-modeling-the-reproducibility-principle.aspx</link><pubDate>Thu, 22 Dec 2011 20:48:07 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:40595</guid><dc:creator>Hugo Kornelis</dc:creator><slash:comments>5</slash:comments><comments>http://www2.sqlblog.com/blogs/hugo_kornelis/comments/40595.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/hugo_kornelis/commentrss.aspx?PostID=40595</wfw:commentRss><description>A year or so ago, I watched a few episodes of a Dutch television program that had an interesting format. The name of the series was (or is, I have no idea if it still runs) “Sterren op het doek” (“Stars on Canvas”). Every episode featured a Dutch celebrity,...(&lt;a href="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2011/12/22/principles-of-modeling-the-reproducibility-principle.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=40595" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Conference/default.aspx">Conference</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Data+modeling/default.aspx">Data modeling</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Database+design/default.aspx">Database design</category></item><item><title>Principles of Modeling: the Concreteness Principle</title><link>http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2011/10/29/principles-of-modeling-the-concreteness-principle.aspx</link><pubDate>Fri, 28 Oct 2011 21:41:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39450</guid><dc:creator>Hugo Kornelis</dc:creator><slash:comments>3</slash:comments><comments>http://www2.sqlblog.com/blogs/hugo_kornelis/comments/39450.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/hugo_kornelis/commentrss.aspx?PostID=39450</wfw:commentRss><description>In an earlier post , I talked about the Jargon Principle, one of three principles I learned in 1994 and that have not only helped make me a better modeler, but that I have found to be very valuable in many other situations as well. Today, I will cover...(&lt;a href="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2011/10/29/principles-of-modeling-the-concreteness-principle.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=39450" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Data+modeling/default.aspx">Data modeling</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Database+design/default.aspx">Database design</category></item><item><title>Principles of Modeling: the Jargon Principle</title><link>http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2011/10/20/principles-of-modeling-the-jargon-principle.aspx</link><pubDate>Thu, 20 Oct 2011 09:39:09 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39229</guid><dc:creator>Hugo Kornelis</dc:creator><slash:comments>4</slash:comments><comments>http://www2.sqlblog.com/blogs/hugo_kornelis/comments/39229.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/hugo_kornelis/commentrss.aspx?PostID=39229</wfw:commentRss><description>In one of my previous posts , I discussed whether data modeling is art or science, and I concluded that, unfortunately, the current state of affairs is that it’s closer to art than to science, whereas I would like to see the opposite. And I think that...(&lt;a href="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2011/10/20/principles-of-modeling-the-jargon-principle.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=39229" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Data+modeling/default.aspx">Data modeling</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Database+design/default.aspx">Database design</category></item><item><title>Data modeling: art or science?</title><link>http://www2.sqlblog.com/blogs/hugo_kornelis/archive/2008/08/03/data-modeling-art-or-science.aspx</link><pubDate>Sat, 02 Aug 2008 22:04:44 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:8173</guid><dc:creator>Hugo Kornelis</dc:creator><slash:comments>11</slash:comments><comments>http://www2.sqlblog.com/blogs/hugo_kornelis/comments/8173.aspx</comments><wfw:commentRss>http://www2.sqlblog.com/blogs/hugo_kornelis/commentrss.aspx?PostID=8173</wfw:commentRss><description>&lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;When I started blogging here on &lt;/font&gt;&lt;a href="http://sqlblog.com/"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;sqlblog.com&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;, I intended to write about stuff like T-SQL, performance, and such; but also about data modeling and database design. In reality, the latter has hardly happened so far – but I will try to change that in the future. Starting off with this post, in which I will pose (and attempt to answer) the rather philosophical question of the title: is data modeling an art or a science?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;Before I can answer the question, or at least tell you how &lt;b style="mso-bidi-font-weight:normal;"&gt;I&lt;/b&gt; think about the subject, we need to get the terms straight. So that if we disagree, we’ll at least disagree over the same things instead of actually agreeing but not noticing. In the next two paragraphs, I’ll try to define what data modeling is and how it relates to database design, and what sets art apart from science in our jobs.&lt;i style="mso-bidi-font-style:normal;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;strong&gt;&lt;span style="font-size:13.5pt;mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman"&gt;Data modeling and database design&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;When you have to create a database, you will ideally go through two stages before you start to type your first &lt;/font&gt;&lt;/span&gt;&lt;span style="font-size:10pt;font-family:'Courier New';mso-no-proof:yes;"&gt;CREATE TABLE&lt;/span&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt; statement. The first stage is data modeling. In this stage, the information that has to be stored is inventoried and the structure of that information is determined. This results in a logical data model. The logical data model should focus on correctness and completeness; it should be completely implementation agnostic.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;The second stage is a transformation of the logical data model to a database design. This stage usually starts with a mechanical conversion from the logical data model to a first draft of the implementation model (for instance, if the data model is represented as an &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Entity-relationship_model"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;ERM diagram&lt;/font&gt;&lt;/a&gt;&lt;font face="Times New Roman" size="3"&gt; but the data has to be stored in an &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Relational_database_management_system/"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;RDBMS&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;, all entity and all many-to-many relationships become tables and all attributes and 1-to-many relationships become columns). After that, optimization starts. Some of the optimizations will not affect the layout of tables and columns (examples are choosing indexes, or implementing partitioning), but some other optimizations will do just that (such as adding a surrogate key, denormalizing tables, or building indexed views to pre-aggregate some data). The transformation from logical data model to database design should focus on performance for an implementation on a specific platform. As long as care is taken that none of the changes made during this phase affect the actual &lt;i style="mso-bidi-font-style:normal;"&gt;meaning&lt;/i&gt; of the model, the resultant database design will be just as correct and complete (or incorrect and incomplete) as the logical data model that the database design is based on.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;Many people prefer to take a shortcut by combining these two stages. They produce a data model that is already geared towards implementation in a specific database. For instance by adding surrogate keys right into the first draft of the data model, because they already know (or think) that they will eventually be added anyway. I consider this to be bad practice for the following reasons:&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;ol style="margin-top:0cm;"&gt; &lt;li class="MsoNormal" style="margin:0cm 0cm 0pt;mso-list:l1 level1 lfo1;tab-stops:list 36.0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;It increases the chance of errors. When you have to focus on correctness and performance at the same time, you can more easily lose track.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/li&gt; &lt;li class="MsoNormal" style="margin:0cm 0cm 0pt;mso-list:l1 level1 lfo1;tab-stops:list 36.0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;It blurs the line. If a part of a data model can never be implemented at acceptable performance, an explicit decision has to be made (and hopefully documented) to either accept crappy performance or change the data model. If you model for performance, chances are you’ll choose a better performing alternative straight away and never document properly why you made a small digression from the original requirements.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/li&gt; &lt;li class="MsoNormal" style="margin:0cm 0cm 0pt;mso-list:l1 level1 lfo1;tab-stops:list 36.0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;It might have negative impact on future performance. The next release of your DMBS, or maybe even the next service pack, may cause today’s performance winner to be tomorrows performance drainer. By separating the logical data model from the actual database design, you make it very easy to periodically review the choices you made for performance and assess whether they are still valid.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/li&gt; &lt;li class="MsoNormal" style="margin:0cm 0cm 0pt;mso-list:l1 level1 lfo1;tab-stops:list 36.0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;It reduces portability and maintainability. If, one day, your boss informs you that you need to port an existing application to another RDBMS, you’ll bless the day you decided to separate the logical data model from the physical database design. Because you now only need to pull out the (still completely correct) logical data model, transform again, but this time apply optimization tricks for the new RDBMS. And also, as requirements change, it is (in my experience) easier to identify required changes in an implementation-independent logical data model and then move the changes over to the physical design, than to do that all at once if only the design is available.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/li&gt; &lt;li class="MsoNormal" style="margin:0cm 0cm 0pt;mso-list:l1 level1 lfo1;tab-stops:list 36.0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;It may lead to improper choices. More often that I’d like, I have seen good modelers fall victim to bad habits. Such as, for instance, adding a surrogate key to &lt;b style="mso-bidi-font-weight:normal;"&gt;every&lt;/b&gt; table (or entity or whatever) in the data model. But just because surrogate keys are &lt;i style="mso-bidi-font-style:normal;"&gt;often&lt;/i&gt; good for performance (on SQL Server that is – I wouldn’t know about other DBMS’s) doesn’t mean they should &lt;i style="mso-bidi-font-style:normal;"&gt;always&lt;/i&gt; be used. And the next step (that I’ve witnessed too!) is forgetting to identify the real key because there already is a key (albeit a surrogate key).&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;strong&gt;&lt;span style="font-size:13.5pt;mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman"&gt;Art and science&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;For the sake of this discussion, “art” (work created by an artist) is the result of some creative process, usually completely new and unique in some way. Most artists apply learned skills, though not always in the regular way. Artists usually need some kind of inspiration. There is no way to say whether a work of art is “good” or “bad”, as that is often in the eye of the beholder – and even if all beholders agree that the work sucks, you still can’t pinpoint what exactly the artist has done wrong. Examples of artists include painters, composers, architects, etc. But some people not usually associated with art also fit the above definition, such as politicians, blog authors, or scientists (when breaking new grounds, such as &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Codd"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;Codd&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt; did when he invented the relational model). Or a chef in a fancy restaurant who is experimenting with ingredients and cooking processes to find great new recipes to include on the menu.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;“Science” does &lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;i style="mso-bidi-font-style:normal;"&gt;NOT&lt;/i&gt;&lt;/b&gt; refer to the &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Science"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;work of scientists&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;, but to work carried out by professionals, for which not creativity but predictability is the first criterion. Faced with the same task, a professional should consistently arrive at correct results. That doesn’t imply that he or she always &lt;b style="mso-bidi-font-weight:normal;"&gt;will&lt;/b&gt; get correct results, but if he or she doesn’t, then you can be sure that an error is made and that careful examination of the steps taken will reveal exactly what that error was. Examples of professionals include bakers, masons, pilots, etc. All people that you trust to deliver work of a consistent quality – you want your bread to taste the same each day, you want to be sure your home won’t collapse, and you expect to arrive safely whenever you embark a plane. And a regular restaurant cook is also supposed to cook the new meal the chef put on the menu exactly as the chef instructed.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;strong&gt;&lt;span style="font-size:13.5pt;mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman"&gt;Data modeling: art or science?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;Now that all the terms have been defined, it’s time to take a look at the question I started this blog post with – is data modeling an art or a science? And should it be?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;To start with the latter, I think it &lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;i style="mso-bidi-font-style:normal;"&gt;should&lt;/i&gt;&lt;/b&gt; be a science. If a customer pays a hefty sum of money to a professional data modeler to deliver a data model, then, assuming the customer did everything one can reasonably expect&lt;sup&gt;1&lt;/sup&gt; to answer questions from the data modeler, the customer can expect the data model to be correct&lt;sup&gt;2&lt;/sup&gt;.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt 18pt;text-indent:-18pt;mso-list:l0 level1 lfo2;tab-stops:list 18.0pt;"&gt;&lt;font face="Times New Roman"&gt;&lt;sup&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;span style="mso-list:ignore;"&gt;&lt;font size="3"&gt;1&lt;/font&gt;&lt;span style="font:7pt 'Times New Roman';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;I consider it &lt;i style="mso-bidi-font-style:normal;"&gt;reasonable&lt;/i&gt; to expect that the customer ensures that all relevant questions asked by the data modeler are answered, providing the data modeler asks these questions in a language and a jargon familiar to the customer and/or the employees he interviews. I do not consider it reasonable to expect the customer (or his employees) to learn language, jargon, or diagramming style used by data modelers.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt 18pt;text-indent:-18pt;mso-list:l0 level1 lfo2;tab-stops:list 18.0pt;"&gt;&lt;font face="Times New Roman"&gt;&lt;sup&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;span style="mso-list:ignore;"&gt;&lt;font size="3"&gt;2&lt;/font&gt;&lt;span style="font:7pt 'Times New Roman';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;A data model is &lt;i style="mso-bidi-font-style:normal;"&gt;correct&lt;/i&gt; if it allows any data collection that is allowed according to the business rules, and disallows any data collection that would violate any business rule.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;Unfortunately, my ideas of what data modeling &lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;i style="mso-bidi-font-style:normal;"&gt;should&lt;/i&gt;&lt;/b&gt; be like appear not to be on par with the current state of reality. One of the key factors I named for “science” is predictability. And to have a predictable outcome, a process must have a solid specification. As in, “if situation X arises, you need to ask the customer question Y; in case of answer Z, add &lt;i style="mso-bidi-font-style:normal;"&gt;this&lt;/i&gt; and remove &lt;i style="mso-bidi-font-style:normal;"&gt;that&lt;/i&gt; in the data model”. Unfortunately, such exactly specified process steps are absent in most (and in all commonly used!) data modeling methods. However, barring those rules, you have to rely on the inspiration of the data modeler – will he or she realize that question Y has to be asked? And if answer Z is given, will the modeler realize that &lt;i style="mso-bidi-font-style:normal;"&gt;this&lt;/i&gt; has to be added and &lt;i style="mso-bidi-font-style:normal;"&gt;that&lt;/i&gt; has to be removed? And if that doesn’t happen, then who’s to blame? The modeler, for doing everything the (incomplete) rules that do exist prescribe, but lacking the inspiration to see what was required here? Or should we blame ourselves, our industry, for allowing ourselves to have data modeling as art for several decades already, and still accepting this as “the way it is”?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;Many moons ago, when I was a youngster that had just landed a job as a &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/PL/I"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;PL/I&lt;/font&gt;&lt;/a&gt;&lt;font face="Times New Roman" size="3"&gt; programmer, I was sent to a course for &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Jackson_Structured_Programming"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;Jackson Structured Programming&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;. This is a method to build programs that process one or more inputs and produce one or more outputs. Though it can be used for interactive programs as well, it’s main strength is for batch programs, accessing sequential files. The course was great – though the students would not always arrive at the exact same design, each design would definitely be either correct, or incorrect. All correct designs would yield the same end result when executed against the same data. And for all incorrect designs, the teacher was able to pinpoint where in the process an error was made. For me, this course changed programming from art into science.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;A few years later, I was sent to a data modeling course. Most of the course focused on how to represent data models (some variation of ERM was used). We were taught how to represent the data model, but not how to find it. At the end, we were given a case study and asked to make a data model, which we would then present to the class. When we were done and the first model was presented, I noticed some severe differences from my model – differences that would result in different combinations of data being allowed or rejected. So when the teacher made some minor adjustments and then said that this was a good model, I expected to get a low note for my work. Then the second student had model that differed from both the first and my model – and again, the teacher mostly agreed to the choices made. This caused me to regain some of my confidence – and indeed, when it was my turn to present my, once again very different, model, I too was told that this was a very good one. So we were left with three models, all very different, and according to the instructor, they were all “correct” – and yet, data that would be allowed in the one would be rejected by the other. So &lt;i style="mso-bidi-font-style:normal;"&gt;this&lt;/i&gt; course taught me that data modeling was not science, but pure art.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;This was over a decade ago. But has anything changed in between? Well, maybe it has – but if so, it must have been when I was not paying attention, for I still do not see any mainstream data modeling method that does provide the modeler with a clear set of instructions on what to do in every case, or the auditor with a similar set of instructions to check whether the modeler did a great job, or screwed up.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;strong&gt;&lt;span style="font-size:13.5pt;mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman"&gt;Who’s to blame?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;Another difference between art and science is the assignment of blame. If you buy a bread, have a house built, or embark on a plane, then you know who to blame if the bread is sour, if the house collapses, or if the plane lands on the wrong airport. But if you ask a painter to create a painting for your living and you don’t like the result, you can not tell him that he screwed up – because beauty is truly a matter of taste.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;Have you ever been to a shop where all colors paint can be made by combining adequate amounts of a few base colors? Suppose you go to such a shop with a small sample of dyed wood, asking them to mix you the &lt;i style="mso-bidi-font-style:normal;"&gt;exact&lt;/i&gt; same color. The shopkeeper rummages through some catalogs, compares some samples, and then scribbles on a note: “2.98 liters white (#129683), 0.15 liters of cyan (#867324), and 0.05 liters of red (#533010)”. He then tells you that you have to sign the note before he can proceed to mix the paint. So you sign. And then, once the paint has been mixed, you see that it’s the wrong color – and the shop keepers then waves the signed slip of paper, telling you it’s “exactly according to the specification you signed off”, so you can’t get your money back. Silly, right?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font face="Times New Roman" size="3"&gt;And yet, in almost every software project, there will be a moment when a data model is presented to the customer, usually in the form of an ERM diagram or a &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Unified_Modeling_Language"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;UML&lt;/font&gt;&lt;/a&gt;&lt;font face="Times New Roman" size="3"&gt; &lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Class_diagram"&gt;&lt;font face="Times New Roman" color="#800080" size="3"&gt;class diagram&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;, and the customer is required to sign off for the project to continue. This is, with all respect, the same utter madness as the paint example. Let’s not forget that the customer is probably a specialist in his trade, be it banking, insurance, or selling ice cream, but not in reading ERM or UML diagrams. How is he supposed to check whether the diagram is an accurate representation of his business needs and business rules?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;The reason why data modelers require the customer to sign off the data model is, because they know that data modeling is not science but art. They know that the methods they use can’t guarantee correct results, even on correct inputs. So they require a signature on the data model, so that later, when nasty stuff starts hitting the fan, they can wave the signature in the customer’s face, telling him that he himself signed for the implemented model.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;In the paint shop, I’m sure that nobody would agree to sign the slip with the paint serial numbers. &lt;b style="mso-bidi-font-weight:normal;"&gt;I&lt;/b&gt; wouldn’t! I would agree, though, to place my signature on the sample I brought in, as that is a specification I can understand. Translating that to paint numbers and quantities is supposed to be the shopkeepers’ job, so let him take responsibility for that.&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;So, I guess the real question is … why do customers still accept it when they are forced to sign for a model they are unable to understand? Why don’t they say that they will gladly sign for all the requirements they gave, and for all the answers they provided to questions that were asked in a language they understand, but that they insist on the data modeler taking responsibility for his part of the job?&lt;o:p&gt;&lt;/o:p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="margin:0cm 0cm 0pt;"&gt;&lt;span style="mso-ansi-language:en-us;"&gt;&lt;o:p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt; &lt;p&gt;&lt;span style="font-size:12pt;font-family:'Times New Roman';mso-fareast-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:nl;mso-bidi-language:ar-sa;"&gt;Maybe the &lt;i style="mso-bidi-font-style:normal;"&gt;true&lt;/i&gt; art of data modeling is that data modelers are still able to get away with it…&lt;/span&gt;&lt;/p&gt;&lt;img src="http://www2.sqlblog.com/aggbug.aspx?PostID=8173" width="1" height="1"&gt;</description><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Data+modeling/default.aspx">Data modeling</category><category domain="http://www2.sqlblog.com/blogs/hugo_kornelis/archive/tags/Database+design/default.aspx">Database design</category></item></channel></rss>