THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Allen White

Keep Your Data Clean

In working with client data you start to notice trends that raise flags in your mind. In my case I'm looking at data and seeing a lot of columns that have NULL in the column where a NULL doesn't mean "I don't know", it means "I was too lazy to put something here".

In the stored procedures to process the data and produce management level reports, almost every query begins with

SELECT DISTINCT

And then most of the amount columns are referenced in the query like this:

ISNULL(t.some_amount, 0) AS some_amount

Now, when your tables have a small number of rows, it really won't hurt you, from a performance perspective. But as a business grows these practices are the start of major performance problems, as the query processor has to do a lot of extra work to clean the data as it processes each time it's needed.

Wouldn't it make more sense to keep it clean all the time, and eliminate these costly processes?

Allen

Published Friday, March 05, 2010 11:44 AM by AllenMWhite

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alexander Kuznetsov said:

Allen,

From a startup perspective, usually it makes sense to clean up and otherwise invest in better quality only after the business started growing, only when you know that it is likely to succeed. If the project fails, then all the money spent on it is lost. If you invest more money to do it right too early, then you can just lose more money.

March 5, 2010 1:13 PM
 

James Luetkehoelter said:

Hey, how about having to search for "no data" with this:

WHERE CAST(FIELD AS INT)=0 or FIELD IN ('',' ','zero','n/a','NULL') or FIELD IS NULL

March 5, 2010 1:22 PM
 

Paul Nielsen said:

Hi Allen, I wholeheartedly agree. I often build a rule-based engine into my databases to continually find questionable or interesting data.

-Paul

March 5, 2010 1:27 PM
 

Bob Probst said:

@ Alexander K

I agree with your thoughts to an extent.  There's an argument to be made for applying YAGNI principles to some of your design but I don't know if data cleansing is one of those.  Poor data validation often requires that more work be put into using that data and offers more opportunities for unexpected (and sometimes unnoticed) errors to creep in.  Issues like this can cause a startup to fail.  Furthermore, should your business take off, you're hopefully going to be busy adding features and growing the services -- maybe too busy to go back and fix those broken windows you left for later.

Software that relies on a stored data solution should have data cleanliness as its foundation.

March 5, 2010 1:53 PM
 

Alexander Kuznetsov said:

@Bob

It is very common that a startup saves money on not fully implementing data integrity, then succeeds. This very post, and dozen of other similar ones, prove that many such startups succeed and later can afford to hire data professionals for clean up.

Those that fail do not get on the radar of data professionals, they just disappear. So our statistics are skewed - we do not include those who "have data cleanliness as its foundation" as you suggest, and fail.

March 5, 2010 2:20 PM
 

Armando Prato said:

I have to wholeheartedly agree with Bob as I've worked in this scenario first hand.  When you fail to properly store data in your database you end up with relics. This makes adding features later that much more difficult.  How many shops really allocate money and talent towards data cleanup vs application features?  I suspect not many since its the features that pay the bills.  

March 5, 2010 3:38 PM
 

Glenn Berry said:

I don't see the relationship between having at least a half-way decent data model (by restricting or not allowing null values) and whether a startup business is going to fail. Even if you cannot afford a DBA, it does not take that much extra work to set a column's nullability and give it a default value. Even a cave man (or a web developer) can probably do that.

March 5, 2010 6:57 PM
 

ALZDBA said:

Once again, correct you are, Obi-Wan  ;-)

IMO dbms constructors should change this default behaviour " nulls allowed " to Nulls Not allowed.

And if that would cause a to huge cultural shift, provide a "sys.configurations" option, so each company can set the default itself.

Cheers,

Johan

March 6, 2010 6:52 AM

Leave a Comment

(required) 
(required) 
Submit

About AllenMWhite

Allen White is a consultant and mentor for Upsearch Technology Services in Northeast Ohio. He has worked as a Database Administrator, Architect and Developer for over 30 years, supporting both the Sybase and Microsoft SQL Server platforms over that period.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement