Today I had to help a customer to make their upgraded invoicing system to work correctly…and as quite often, the source of the problem was a bad denormalization.
From now on, each time I can do it without revelaing too much of the customer, I’ll try to write a post so to it will be clear to anyone (developers in particular) what are the dangers of denormalizing without taking care of consistency.
Denormalization is not bad “a-priori”, so there are situations in which it can help performances and so it make sense in these cases. Plese note that I’m talking of denormalization, which is the conscious decision to denormalize after having normalized the database schema because actual system cannot give use the needed performances. I’m absolutely not talking of un-normalized situations. Unnormalizated database are simply bad, dot. They suffer of a pletora of problems among which, of course, the consistency of data is one of the biggest. So this post is useful in both situations, but please keep in mind that they are two completely different things!
Now, if you decide to denormalize, you simply cannot avoid to implement a constratint that enforces the integrity of the information you’re going to duplicate. And it cannot be an application level constraint, but it must be a database-level constraint….otherwise you can find yourself in the following situation: you have a column “InvoiceType” that holds the information about the type of the entity; If it’s an invoce it holds “I” and if it’s a pre-invoce (a sort of notice that an invoice will be issued) it holds “P”. Also a column named “InvoiceNumber” exists, and it stores the – guess? – invoce number. But of course pre-invoices doesn’t have any number so for them this column will always be zero.
Now – you’re probably seeing where I’m going – what happen if you have a row with InvoiceType = “P” and InvoiceNumber = 1234? You’re in trouble. Well, actualy the company is in trouble, since no-one can tell if this is an Invoice or a Pre-Invoce. And of course this affects the Gross Margin and you can bet the Boss won’t be happy to know this!
This should have never happened….but despite everything it happened. A human error, a bug in the (ugly) application….it happened. But it would have never ever happened if a simple CHECK CONSTRAINT would have been used (as you can see I’m not discussing that the table is not normalized).
The situation created by such omission results in a more complex ETL Phase for the BI solution we’ve built, plus some additional work by a person who job is simply to check and solve the identified anomalies.
The conclusion is that, if denormalization is the choosen way, constraint to preserve information integrity must be put in place. Put it in another way: if you decide to denormalize you’re taking the responsibility to keep information consistent. And this is not an option. Apply some basic logic, the result is that constrains are not an option! (You think they will slow you down during insert, update and delete? They may do…but that’s what you decided when you started to denormalize!)