THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Denormalizing to enforce business rules: Part 1.

You cannot use CHECK constraints to compare columns in different rows or in different tables, unless you wrap such functionality in scalar UDFs invoked form a CHECK constraint. What if you actually need to compare columns in different rows or in different tables to enforce a business rule? For example, suppose that you know working hours of a doctor, and you want to make sure that all appointments fit within working hours? Of course, you can use a trigger or a stored procedure to implement this business rule, but neither a trigger nor a stored procedure can give you a 100% guarantee that all your data is clean – someone can disable or drop your trigger, enter some dirty data, and re-enable or recreate your trigger. Also someone can directly modify your table bypassing stored procedures. Either way you can end up with data violating your business rule without knowing about it.

 

Let me demonstrate how to implement this business rule using only FK and CHECK constraints – that will guarantee that all the data satisfies the business rule as long as all the constraints are trusted.

 

Note: for more about trusted constraints read a post by SQL Server MVP Hugo Kornelis.

 

The table storing working hours has one additional constraint which will be used to make sure that all appointments fit within working hours, as follows:

 

CREATE Table Data.WorkingHours(WorkingHoursID INT NOT NULL,

      StartedAt DATETIME NOT NULL,

      FinishedAt DATETIME NOT NULL,

      ProfessionalsName VARCHAR(50),

    CONSTRAINT PK_WorkingHours PRIMARY KEY(WorkingHoursID),

-- The purpose of UNQ_WorkingHours is as follows:

-- a FOREIGN KEY constraint from a child table will refer to it.

      CONSTRAINT UNQ_WorkingHours UNIQUE(WorkingHoursID, StartedAt, FinishedAt),

      CONSTRAINT CHK_WorkingHours_ValidWindow CHECK(StartedAt < FinishedAt)

);

GO

SET NOCOUNT ON;

INSERT Data.WorkingHours(WorkingHoursID,

      ProfessionalsName,

      StartedAt,

      FinishedAt)

SELECT 1, 'Jane He', '20061025 08:00AM', '20061025 05:00PM' UNION ALL

SELECT 2, 'Jane He', '20061026 08:30AM', '20061026 05:00PM' UNION ALL

SELECT 3, 'Jane He', '20061027 08:30AM', '20061027 05:00PM';

GO

 

Now I am going to store the beginning and ending of working hours (that is StartedAt and FinishedAt columns) along with every appointment, so that I can use a CHECK constraint to compare them with the beginning and ending of appointments, as follows:

 

CREATE Table Data.Appointments ( WorkingHoursID INT NOT NULL,

      StartedAt DATETIME NOT NULL,

      FinishedAt DATETIME NOT NULL,

      WorkingHourStartedAt DATETIME NOT NULL,

      WorkingHourFinishedAt DATETIME NOT NULL,

      CONSTRAINT FK_Appointments_WorkingHours

            FOREIGN KEY(WorkingHoursID, WorkingHourStartedAt, WorkingHourFinishedAt)

            REFERENCES Data.WorkingHours(WorkingHoursID, StartedAt, FinishedAt) ON UPDATE CASCADE,

      CONSTRAINT CHK_Appointments_ValidWindow CHECK(StartedAt < FinishedAt),

      CONSTRAINT CHK_TimesInRange

            CHECK((StartedAt BETWEEN WorkingHourStartedAt AND WorkingHourFinishedAt)

            AND (FinishedAt BETWEEN WorkingHourStartedAt AND WorkingHourFinishedAt))

);

 

Clearly storing two columns from the parent table in the child one is denormalization. However, my FOREIGN KEY constraint guarantees that the columns from the parent table always match the corresponding ones in the child one. On top of that, the ON UPDATE CASCADE clause allows me to modify the values in the parent table, and these changes are propagated into the child one automatically. I will demonstrate it soon. The following inserts succeed, because the appointments fit into business hours:

INSERT Data.Appointments(WorkingHoursID,

      StartedAt,

      FinishedAt,

      WorkingHourStartedAt,

      WorkingHourFinishedAt)

SELECT 1, '20061025 09:00AM', '20061025 05:00PM', '20061025 08:00AM', '20061025 05:00PM' UNION ALL

SELECT 2, '20061026 08:30AM', '20061026 01:00PM', '20061026 08:30AM', '20061026 05:00PM' UNION ALL

SELECT 3, '20061027 10:30AM', '20061027 01:00PM', '20061027 08:30AM', '20061027 05:00PM';

 

You can change appointments and working hours as long as appointments still fit, for example:

 

UPDATE Data.Appointments SET StartedAt = '20061025 08:30AM'

WHERE WorkingHoursID = 1;

UPDATE Data.WorkingHours SET StartedAt = '20061027 09:31AM'

WHERE WorkingHoursID = 3;

UPDATE Data.WorkingHours SET FinishedAt = '20061026 04:31PM'

WHERE WorkingHoursID = 2;

 

 

Note that the changes in WorkingHours table were propagated into the child table automatically:

 

SELECT StartedAt,

       FinishedAt,

       WorkingHourStartedAt,

       WorkingHourFinishedAt

FROM Data.Appointments;

 

StartedAt               FinishedAt              WorkingHourStartedAt    WorkingHourFinishedAt

----------------------- ----------------------- ----------------------- -----------------------

2006-10-25 08:30:00.000 2006-10-25 17:00:00.000 2006-10-25 08:00:00.000 2006-10-25 17:00:00.000

2006-10-26 08:30:00.000 2006-10-26 13:00:00.000 2006-10-26 08:30:00.000 2006-10-26 16:31:00.000

2006-10-27 10:30:00.000 2006-10-27 13:00:00.000 2006-10-27 09:31:00.000 2006-10-27 17:00:00.000

 

However, you cannot extend your appointment beyond working hour nor you can shorten your working hours so that some appointment no longer fits:

-- all fail: range not completely inside the parent range

INSERT Data.Appointments VALUES(3, '20061027 7:30AM', '20061027 01:00PM', '20061027 08:30AM', '20061027 05:00PM');

INSERT Data.Appointments VALUES(3, '20061027 10:30AM', '20061027 08:00PM', '20061027 08:30AM', '20061027 05:00PM');

-- fails because there is an appoinment starting at 10:30

UPDATE Data.WorkingHours SET StartedAt = '20061027 10:31AM' WHERE WorkingHoursID = 3;

Msg 547, Level 16, State 0, Line 2

The INSERT statement conflicted with the CHECK constraint "CHK_TimesInRange". The conflict occurred in database "Test", table "Data.Appointments".

The statement has been terminated.

Msg 547, Level 16, State 0, Line 3

The INSERT statement conflicted with the CHECK constraint "CHK_TimesInRange". The conflict occurred in database "Test", table "Data.Appointments".

The statement has been terminated.

Msg 547, Level 16, State 0, Line 5

The UPDATE statement conflicted with the CHECK constraint "CHK_TimesInRange". The conflict occurred in database "Test", table "Data.Appointments".

The statement has been terminated.

 

In many cases constraints also perform better than triggers, but this is another story…


In the next post I am discussing running totals
Published Wednesday, January 21, 2009 6:49 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Hugo Kornelis said:

Hi Alex,

An interesting technique. Thanks for the post.

However, I don't think I'd actually use it very often. You have to store lots of redundant data, which increases the I/O for most queries and hence hurts overall performance. There is also the issue of maintaining the information - if a doctor changes her working hours, the change has to be propagated to all the appointment rows in order to prevent a foreign key violation. That's a lot of work, along with a lot of locking and possibly blocking. The cascading update doesn't allow me to control the order in which locks are taken, so deadlocks might occur as well. And in this particular case, it makes no sense - if a doctor starts working shorter hours next month, the appointments for the previous month should not suddenly start violating the business rule (but I understand that I should take this as a simplified example).

Another point, one of your arguments to avoid stored procedures or triggers for this is that stored procedures can be avoided and triggers can be temporarily disabled. The latter means your end users have permission to disable a trigger, which I think implies that they have permission to disable a CHECK or FOREIGN KEY constraint as well, so what exactly is the advantage here? Editing data outside of the stored procedure is a somewhat better point, but easily avoidable by giving users only permission to execute the stored procedures and no permissions whatsoever on the tables.

Best, Hugo

January 22, 2009 5:43 AM
 

Jack Corbett said:

This is definitely an interesting technique.  

I agree with Hugo that I likely would not use it for the same reasons.  Although I think he missed the point that the WorkingHours table would have a row for each day worked, so historical rows would not be affected unless you changed a past WorkingHours row.  

Of course, there is always someone that has rights to issue an update and that is the DBA.

January 22, 2009 9:19 AM
 

Alexander Kuznetsov said:

Hi Hugo,

1. Working hours are stored per every business day, and changes are propagated only for the appointments for that day. For example, the following update:

UPDATE Data.WorkingHours SET StartedAt = '20061027 09:31AM' WHERE WorkingHoursID = 3;

means that Jane He begins to work at 9:31 only on 2006-10-27. Not that this change was propagated to only one appointment, on that day. Similarly, the change from the other update:

UPDATE Data.WorkingHours SET StartedAt = '20061027 09:31AM'

WHERE WorkingHoursID = 3;

Was also propagated to only one appointment - the changed columns are in red.

2. The difference between triggers and constraints is simple: you cannot have a trusted constraint if some data violates it. I referred to your own post in mine. IMO it is quite common that although regular users execute stored procedures with well defined permissions, sometimes there are data cleanup scripts which run only once by DBAs. Also triggers can be bypassed without disabling them. Example here:

http://sqlserver-tips.blogspot.com/2006/12/bypassing-trigger.html

Cheers,

AK

January 22, 2009 9:27 AM
 

Alexander Kuznetsov said:

Hugo,

Also I am not quite following you argument about a lot of locking. I think that with business rules like this you need exclusive locks on rows in both tables anyway, even with triggers - otherwise two concurrent updates under snapshot isolation might end up violating business rules. And you can control the order in which locks are acquired if needed - you can begin a transaction, select with UPDLOCK, then update. Please correct me if I am wrong.

January 22, 2009 9:43 AM
 

Alexander Kuznetsov said:

Hugo,

Actually my approach requires less locking than the conventional one with triggers or stored procedures. Come to think of it, if you modify Appointments, with my approach you do not need locks on WorkingHours, but with triggers you do. If you modify WorkingHours, you need locks on Appoinments anyway, otherwise your implementation of business rules has loopholes.

January 22, 2009 10:47 AM
 

Michael Swart said:

You might get around the denormalization by putting a unique index on an indexed view based on the query:

SELECT 1 one

FROM Data.Appointments a

JOIN Data.WorkingHours wh

ON a.WorkingHoursId = wh.WorkingHoursId

WHERE NOT a.StartedAt BETWEEN wh.StartedAt AND wh.FinishedAt

OR NOT a.FinishedAt BETWEEN wh.StartedAt AND wh.FinishedAt

This puts on a restriction that allows at most one row from violating the constraint. The view could be modified to restrict the view to at most zero rows that violate the constraint.

http://dbwhisperer.blogspot.com/2008/11/adding-check-constraint-to-view.html

January 22, 2009 11:34 AM
 

Alexander Kuznetsov said:

Michael,

Your approach is quite interesting.

However I think that creating an indexed view is also denormalization under very thin disguise, with the same consequences - lock contention and additional storage.

Can you demonstrate how would you make sure that "at most zero rows that violate the constraint"?

January 22, 2009 1:12 PM
 

Michael Swart said:

If the view included one row specifically picked to be the one-and-only row. (very messy I know). And for that reason and others, I would recommend your approach or simply triggers over this approach.

SELECT 1 one

FROM Data.Appointments a

JOIN Data.WorkingHours wh

ON a.WorkingHoursId = wh.WorkingHoursId

WHERE NOT a.StartedAt BETWEEN wh.StartedAt AND wh.FinishedAt

OR NOT a.FinishedAt BETWEEN wh.StartedAt AND wh.FinishedAt

OR wh.WorkingHoursId = 1 -- placeholder condition

Storage wouldn't be a problem though, it stores one row only and so one extra page (8kb) of storage space.

It's very possible that locks would be a problem, I haven't thought about it.

January 22, 2009 4:31 PM
 

Hugo Kornelis said:

Hi Alex,

You're first point is right. I had missed the day being part of the working hours table. My apologies - I had only glanced over the code to get a general idea of the technique and not studied it in enough detail.

re Trusted constraints - true. Although it's quite rare to get a plan optimization because of a trusted constraint, just checking the is_trusted column of sys.constraints does provide a very convenient way to check at a glance where possible business rule violations might exist.

re Bypassing triggers - you presented this point in your post as an argument for constraints and against triggers. That would be valid if someone without permission to disable a constraint could find a way to bypass a trigger. So far, I've only seen ways to bypass a trigger that require the same or even more privileges than bypassing a constraint.

re Locking - if I use triggers to verify the business rule, I don't need the denormlized columns in the appointments table. So for a new or changed appointment, I'd only need to get shared locks in working hours to check the validity of the appointment time. And for a change in working hours, I'd get shared locks on the appointments table to see if any existing appointment falls outside the new working hours. (I could even check first to see if the new working hours are shorter or longer - in the latter case, no check for appointments is required at all!)

re Snapshot isolation - good point. I don't use snapshot isolation much. If I had to implement a schema such as this and it has to support snapshot isolation, I'd have to give it a long and hard think. My first idea is to get an UPDLOCK on the appropriate row in WorkingHours when checking a new or changed appointment and not doing any special locking when checking a changed or deleted WorkingHours row. I think that completely eliminates the chance of violations with minimal extra locking and no chance of deadlocks. But as I said, if I had to do it for real I'd give it more thought then I did just now.

re Order of locking - yes, I *think* that selecting with UPDLOCK first allows full control over the locking order. But as I said before, not duplicating the data at all is a far better way to reduce locking problems.

I think I addressed everything. Let me know if I missed anything. Good discussion!

Best, Hugo

January 23, 2009 5:03 AM
 

Alexander Kuznetsov said:

Calculating running totals is notoriously slow, whether you do it with a cursor or with a triangular

January 23, 2009 3:01 PM
 

Alexander Kuznetsov said:

Hi Hugo,

It was a great discussion, thanks! The only one thing left out was performance, but it is quite complex to benchmark. Maybe next time.

January 23, 2009 3:25 PM
 

Great Scott said:

I have personally had to try to rescue a situation where a trigger was diabled by a DBA for maintenance work who - he then forgot to re-enable it.  The mistake was not noticed for a couple of weeks by which time relational integrity was shot to pieces as the design (not mine) used triggers rather than FKs to enforce RI.

I have used the technque above in a design recently and it seems to be working pretty well. No locking problems yet and performance is good.

January 24, 2009 7:56 AM
 

TroyK said:

Hi Alex;

>> Clearly storing two columns from the parent table in the child one is denormalization.

Really? What normal form, in particular, does the design violate? What are the exact business rules at play that lead to said violation?

Ignoring the question of the use of the term "denormalization" to describe this technique, it is a useful design approach to have in your toolbox -- another application of the multi-table constraint enforcement I describe in my article here: http://www.sqlservercentral.com/articles/Data+Modeling/61528/.

TroyK

January 27, 2009 2:07 PM
 

Alexander Kuznetsov said:

Hi Troy,

I believe the Appointments table violates 2NF. I don't think it is a big deal though, as the FK with CASCADE clause guarantees integrity.

It's a very interesting write up about pets and vets that you did.

January 27, 2009 2:42 PM
 

Alexander Kuznetsov said:

I will demonstrate how use constraints to make sure that intervals of time have no overlaps. Also you

March 8, 2009 8:34 PM
 

Alexander Kuznetsov said:

Suppose that you need to enforce the following business rule: contracts cannot be changed after you have

May 3, 2009 10:20 PM
 

Alexander Kuznetsov said:

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table

May 4, 2009 11:35 PM
 

Alex said:

> Business rules

There are a lot of situations that cannot be directly modelled by a correctly normalized design.

Although it would be nice if a relational database could enforce all business/logical constraints, it's nigh impossible in some situations.

Also, in our current trend of objects and applications being the ultimate record of information, the database is being relegated to mere storage.

Even if your relational database is worth more than object storage, there's no way it can encompass all your business rules.

> Cascade update/delete

This is not a good way of doing things.

> Triggers can be disabled

Constraints can be disabled.

This is not a good argument.

No?

August 27, 2009 9:43 AM
 

Alexander Kuznetsov said:

Alex,

"in our current trend of objects and applications being the ultimate record of information, the database is being relegated to mere storage."

databases typically outlive several current trends. Also you do not have to blindly follow them - you can make your own decisions based on your unique situation.

"> Cascade update/delete

This is not a good way of doing things."

Can you elaborate why you think it is bad?

"> Triggers can be disabled

Constraints can be disabled."

There is a huge difference - when I re-enable my constraint and it is trusted, I immediately know that all my data is clean. When you re-enable your trigger, it does not re-run against existing data and you do not know if your data is clean or not.

August 27, 2009 10:59 AM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement