THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Did You Know? I have a question for you on DBA Blunders!

A good friend of mine is putting together a presentation on the top DBA Blunders from the view point of a SQL Server Database Admin, not developer. He would love to hear what other people consider to be in the big mistake list.

Of course, you don't have to admit that these were your own blunders; they can be ones you heard about, or that 'a friend' is guilty of. :-)

Thanks for your input!

~Kalen

Published Thursday, May 15, 2008 9:29 PM by Kalen Delaney
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dave said:

Most common ones:

- Executing UPDATE or DELETE statement without including a WHERE clause

- Thinking you're connected to the DEV database, but in reality it's PROD

- Initiating an index / page rebuild without disabling Mirroring

- adding objects in the wrong database.  Even better when there are lots of objects in a single script, and there are some USE statements in the middle so it jumps context half-way through, making fixing it a pain in the butt

- spending 10 hours restoring a copy of the production database to the development server and discovering you left the WITH RECOVERY flag off the end.

May 16, 2008 12:34 AM
 

Kolya said:

Using aautomatically given names for constraints /* create table T1 (a int primary key*/ and then having fun with getting names for these constraints on different environments

May 16, 2008 2:23 AM
 

Markus Bohse said:

I once came to a customer where they had a custom procedure for making back-ups. The job ran fine every day, but then I searched for a backup file  I couldn't find any.

Turned out that they never replaced the PRINT statement from testing with the real EXEC sp_executesql statement.

May 16, 2008 3:47 AM
 

Andy said:

As Dave mentions above, we've seen a delete without a Where statement although that was technically an application fault not a DBA one.

Muddling up the test and live systems

Also:

Testing the DR procedure on the live system only to discover that he did not have a backup of the database.

Forgetting to change the backup tapes in the server on a Monday morning. This cause the tape to fill up which then in turn stopped the backup script deleting old backup files and the database log would grow till the disk was full, this then manifested itself several days later to the users as a "cannot allocate space in syslogs" error message whenever they tried to update some data. They were normally very puzzled when we asked them to go visit the server room and eject the tape as that was the only thing we could not fix remotely.

May 16, 2008 5:21 AM
 

Alejandro Mesa said:

Hi Dave,

> - spending 10 hours restoring a copy of the production database to > the development server and discovering you left the WITH RECOVERY > flag off the end.

Why do you consider this a blunder?

1 - WITH RECOVERY is the default.

2 - You can recover it in a subsquent RESTORE statement.

Example:

use master

go

backup database Northwind

to disk = 'C:\Temp\Northwind.bak'

go

use Northwind

go

drop table dbo.[order details]

go

use master

go

restore database Northwind

from disk = 'C:\Temp\Northwind.bak'

with norecovery

go

restore database Northwind

with recovery

go

AMB

May 16, 2008 7:50 AM
 

alphatross said:

How's this one: a "friend" (OK, it was me) had to run rebuildm to fix an incorrect collation on a Prod server. 2 DB's matched the new Collation, so just Backup & Restore was necessary. But one DB had the old collation and we wanted it to be changed to match TempDb, so he (me!) needed to BCP OUT the data, then BCP back IN (luckily, it was quite a small DB). Anyways, a few days later the users of the app (similar to an online marketplace) reported weird behaviour, like they'd search for Cars and see an image of Shoes! Guess who didn't count on Identity columns! Over time, gaps in the Identity columns had been introduced in rows from deletes (i.e. they were not not sequential), and his (my!) BCP IN had nicely "fixed" this and resequenced them all, which basically corrupted all referential integrity. Luckily I had a backup, and it was a very small DB - only a MB's!

May 16, 2008 7:55 AM
 

Handsome DBA said:

Moving objects between servers (from dev to prod) with DTS 2000 and not unchecking the default which is to move related objects and thereby blowing away numerous production tables.  On a Sunday.  Doh!

May 16, 2008 7:56 AM
 

Kevin3NF said:

Handsome DBA said:

Moving objects between servers (from dev to prod) with DTS 2000 and not unchecking the default which is to move related objects and thereby blowing away numerous production tables.  On a Sunday.  Doh!

+1.  replaced a million row user table with the 10K row user table from Dev.  14 hours later it was fixed.  Darn the MS GUI default in SQL 7.  

May 16, 2008 8:33 AM
 

Mo said:

my former boss knocked the cable (not screwed fast) out of the external RAID array & corrupted the database while doing some server maintenance in the same rack. Took 3 hours to restore the db. During which the 24/7 site was down.

May 16, 2008 9:00 AM
 

Michael Swart said:

Cancelling a long running operation because it was hogging resources. Only to realize afterwards that the rollback operation was going to take about the same amount of time.

May 16, 2008 9:33 AM
 

aspiringgeek said:

1. Sr. S/W Architect at my former day job created a multi-million row table variable *w/o an index* and joined it to a busy production table.  It was associated with a job which ran every four hours, bringing the production db to a halt for 15-30 minutes.  Yet he denied it was he or his process!  DBCC INPUTBUFFER & fn_get_sql() documented the perpetrating batch & statement.

2. Our Director of Infrastructure started changing power sources for the SAN for our multi-terabyte DB without first allowing the production server to power down completely.  Unfortunately, someone tripped over the redundant power supply, unplugging it.  As a consequence, dozens of pages in our DB became corrupted; it took four days to get back up-&-at-'em.  Coincidentally, that day the CEO shook hands with a well-known .com on an acquisition agreement; however, the company backed out of the deal as a direct consequence of this incident (& we're still waiting to cash in our stock options).

May 16, 2008 9:39 AM
 

Denis Gobo said:

I am guilty of this one

script proc, check drop dependent objects and run this on production

and of course forgetting to highlight the where clause

here are some of my coworkers/friends 'blunders'

we created this order entry system which interacted with Great Plains. If customer did not have enough money to pay for the order and email would be send to the manager so he could look up the order and to approve or disapprove in case of special clients

this email went to about 15 people at a NYC company all was good

One day we decided to do a stress test and disapprove 8000 orders. so 120000 emails went out and brought the whole network (on 1 T1) to a standstil

neddless to say the trigger was disabled very fast after that and a order status screen was created instead

script out a table to create a history table, change name in the create statement but not in the drop statement

A billion row table, you want to transfer the last 2 weeks to the backup server every week, somehow someone doesn't know the difference between < and > so the whole table except for the last 2 weeks gets transfered every week

There was a person in the newsgroups who did a restore from a 1 year old backup instead of a backup itself (??) That also was the most recent backup...someone suggested update your resume as the solution

May 16, 2008 10:24 AM
 

Rajan P said:

reinstall the SQL Server if forget the password of SA login.

May 16, 2008 10:30 AM
 

Brian S said:

Working in DEV to optimize some queries.  Got phone call from CFO, he wanted some custom numbers from database.  Switch to PROD, run queries.  Switch back to DEV.  Work for two minutes.  Another call from CFO.  Repeat for forty five minutes.  Ends with (as you guessed) me executing the command TRUNCATE TABLE Checks to remove sample data from the DEV environment.  Except I was in Production.  15 million checks gone.  Regulated industry.  That's bad, right?

It got worse.  Our backup hadn't been functional for several months.  The company had not been willing to pay the cost for a new backup (we had moved to a SAN and our storage requirements had grown dramatically).  We had, thankfully, been telling them EVERY DAY of the risk we were in without a backup or I think I'd have been crucified.

Our oldest backup was 3 months old.  That left us with about 1.5 million checks missing.  We regenerated 1.25 million programmatically.  Then we ran reports, found the files that didn't balance, and hand entered the remaining checks from the paper reports.

=====

We had an application that pulled data from the bank and inserted it into our local database.  The application informed our disbursers when funds were available and checks could be cut.

We moved our database to a new server and the application stopped working.  Frantically, I tried to identify what was going wrong.  Nothing in the code or database had changed, but for some odd reason I simply didn't think about the database server changing.

In the end the default database on the old server was the proper database, so the developer had never issued USE statements and didn't specify the database in the connection string.  On the new server the default database was the master database.  You could argue if this was DBA error or developer error but ultimately -- it should've been caught immediately.

=====

Created a set of stored procedures that took a "snapshot" of our escrow information and stored it in a new database.  The stored procedures executed a ton of dynamic SQL to create a new database based on the current schema, then copied the data to the new database.  After copying the data, the stored procedures would create a set of stored procedures used for reporting inside the new database.

Unfortunately the stored procedures created didn't want to work.  They worked if I copied and pasted their contents into a query window and executed them as a separate SQL statement, but when executed as a stored procedure nothing would happen - even Profiler would only show the first statement.

It turns out that when I programmatically created the stored procedures I forgot the carriage return.  (I was inserting a linefeed only; it might've been reversed.)  When the T-SQL parser was parsing the input, this threw it off.  When I pasted into Query Analyzer, QA would correct/interpret this deficiency properly.

I spent nearly a week, off and on, trying to figure it out.  Then I called Microsoft Support, who spent a little time on it then had me ship them one of the databases I created.  They found it shortly thereafter.  :\

====

I could go on.  I've learned a lot (at the cost of real DBAs :)) but really the most important thing I've learned is that set theory is good.  SQL Server can do amazing things if you can phrase your request properly.  I've cut queries down from three to four hours to milliseconds, active and frequently used processes from ten minutes to a second or two.  

May 16, 2008 10:35 AM
 

Denis Gobo said:

A friend of mine wrote a little post how to deal with the F5 and not selecting everything blunder

http://squaredthoughts.blogspot.com/2008/05/some-pain-is-avoidable-so-spare.html

May 16, 2008 10:49 AM
 

AaronBertrand said:

- setting a database to single_user (e.g. to make a filegroup read-only, or to enable broker) and forgetting to set it back.  Similar for ad hoc begin tran and forgetting to commit/rollback.

- disabling jobs to perform changes, and forgetting to re-enable them.

- accidentally picking and modifying a job schedule that was in use by other jobs.

- leaving the defaults for autogrowth events when configuring a LARGE database.

- running ad hoc deletes / updates *with* a where clause, but without a begin tran... there is no ctrl + z in sql server.

- shutting down sql server and deleting the log file, assuming that sql server will just create a new one on restart.

- not testing backups / DR routines until it is actually needed.

- creating a backup routine but not enabling it.

- letting job history / backup history tables grow forever.

- giving developers a sysadmin or db_owner account for accessing the database, because it's "easier that way."

- writing triggers that send e-mail, write to disk, call external applications, or rely on single-row DML operations.  (This is borderline developer but usually the DBA is responsible for auditing type operations like this.)

May 16, 2008 3:05 PM
 

Denis Gobo said:

Also store your backups off site and I don't mean in WTC building 1 when you are located in WTC building 2 I wrote about that here: (http://sqlblog.com/blogs/denis_gobo/archive/2007/07/14/1727.aspx

May 16, 2008 3:17 PM
 

Prasanna said:

Overwriting the same Database Backup on a daily basis w/o realizing that is the ONLY backup you have & incase something goes wrong with PROD DB / Backup you are dead :)

May 17, 2008 7:42 PM
 

Cooper said:

Restarting SQL Services for maintainance and forgeting to start the Agent back up

May 19, 2008 1:15 PM
 

Ed Q said:

Detaching a very large production database with more than 16 data files using Enterprise manager.  EM detaches fine but doesn't allow you to re-attach database with more than 16 files.  Learning this the first time is very painful. And writing the SQL to manually attach the database with many managers in your cube is even more painful.

May 19, 2008 6:54 PM
 

Denis Gobo said:

Last week Kalen Delaney wrote Did You Know? I have a question for you on DBA Blunders! I though the comments

May 21, 2008 12:52 PM
 

talatnauman said:

Using Enterprise manager to Alter column of a heavily used production table:)!!!!!!!!!!!!!!!!!!!!!!!

May 22, 2008 10:49 AM
 

Cindy Gross said:

Note that I included coding issues because I feel a DBA should review and correct bad coding practices. Which leads to the first blunder of not taking control of the database. If you're responsible for it, fight for it!

Putting all the data and log files on a single, overtaxed drive then complaining about performance.

Making everyone a sysadmin, either by giving out the SA password or putting many people in the syaadmin role.

Not testing a change before putting it into production.

Running update statistics and rebuilding indexes too often or not often enough.

Leaving systems unpatched.

Not monitoring the event logs and SQL errorlogs and addressing warnings and errors.

Defining duplicate indexes (I see this more often than you would think).

Allowing cursors to be used when set based operations would be better.

Using NOLOCK as the default option when troubleshooting blocking.

Using index or join hints without first trying to tune indexes and queries.

Using select * instead of using a column list.

Not having a well-chosen clustered index on every table.

Putting the DTC resource in one of the SQL cluster groups.

May 23, 2008 3:40 PM
 

Marc said:

Making backups in a 7bit character-set on Oracle for textual data that uses special 8bit characters. Apparently there were warnings in the backup-log but the issue wasn't noticed until the company lost their data and had to revert to backups.

May 24, 2008 6:28 AM
 

ghawkes said:

How ironic!  My first post was blown away by a SQl "transaction log is full" error :).

- Grant

May 27, 2008 12:52 AM
 

Kalen Delaney said:

Thank you so much for all the responses to the DBA Blunders post! As I mentioned, however, that question

May 30, 2008 11:48 AM
 

Kalen Delaney said:

Thank you so much for all the responses to the DBA Blunders post! As I mentioned, however, that question

May 30, 2008 11:52 AM
 

Rootman said:

Wanted to move temdb to a different drive.  Wrote script and pasted path to new file name.  Tried to restart instance.  No go!!  FORGOT to put filename onto pasted pathname in script!!  

Then discovered master had not been backed up since forever!!

It was a test box, but my blunder caused 15 testers to have a very quiet day!!

June 2, 2008 9:22 AM
 

Maninder Singh said:

Retore Database from 2000 to 2005 and Forgot to change CMPTLEVEL to 90 also Forgot to change the Filename specs using ALTER Database to point to new location(Drives had changed and so had the Data Folder locations) and the Database Goes offline after every Restart..finally Figured out what was the issues after a couple of restarts.

Quick Solution: Detach the Db and Attach it using the Defualt FileName from the Data Folder.

Also, After Moving Databases(2000 to 2005), still forget to check for orphaned users in the DB.

Quick Fix: sp_change_users_login 'Report' and then sp_change_users_login 'updateOne','user1','user1'

Forget to put the Databases in to Maintenance plan, unless i figure out why the Database is not getting backedup in Production.

Indexes issue on SAN Storage: Our Database was very Slow and we checked every nut and Bolt and all the settings for the Server and the Databse itself, unless i figured out the issues was with the Hypothetical Indexes :

Quick Solution: We Dropped the H Indexes and everything was back to normal.

one of our custom Apps (Maximo) was locked down becuase it doesnt like Statistics Created by SQl Server and only works with its Internal INDEXING Strategy.

Quick Fix: you got it right: Delete custom Statistics and Indexes.9the ones that I created).

June 3, 2008 12:06 PM
 

MANINDER SINGH said:

Restore a Database without Scripting down the permissions for the current users for the DB to be overwritten in PROD.

Restore Database on a LIVE Production Database by mistake when a manager sent a email to a Jr. DBA to Restore on DEV Server?

How? -> Both the server names are kinda similar... BUI_203T(Test) and BUI_203P(Prod). The manager ment was BUI_203T and not BUI_203P.

That Guy was trying and trying and then came to me after an Hour of research and asked me why was it not going through... when i saw what he was tring to do..I was all GUNS at him.(Thanks to those active connections, that somehow restricted him from performing the Restore...). The manager is very careFul now to Include TEST and PROD Keywords in the EMIAL.

VLDB (JDE) 255GB was in place long before i Joined here. there is only one FileGroup and 1 DB File in Primary FG. THe Previous DBa had set this to FULL backup at 2.00AM and Transaction Backups every 30 mins Daily. Then there were Jobs from BUI_203P, that Pulled Data from this DB (using Select * from certain Tables) into WareHouse DB's. several times the Day and also at 2.00AM .. no doubt.. the backup of the VLDB never Finished on Time and went beyond... 9AM Daily.. and Sometimes more, which had to be killed.. so users always were waiting till 1 AM to Start their daily Reports.. What a Bunch of Gurus.! I changed the Plans and Schedules and now the internals users are unhappy because they have to be online at 6AM as per the management Orders. :)

June 4, 2008 9:18 AM
 

Who Needs Ram? said:

I was reassigning the max memory used by SQL server and was meant to change the value to 16000 mb, but typed 16 instead. The server got angry and didn't want to function or restart.

June 17, 2008 11:50 PM
 

I Didnt Do It said:

My favorite (not done by me but by a predecesor): Installing SQL 2000 Personal Edition instead of Standard Edition and then not discovering the problem until you've been in production nearly 6 months and wonder why performance is so bad on your 4-proc 4GB server - until a coworker asks you to SELECT @@VERSION.

June 24, 2008 2:13 PM
 

lawsonta said:

This was me, and it is a good one...

To set up the scenario (so it doesn't sound that bad)...I had setup some batch files to compress backup files on disk daily prior to the tape backup (we are charged per GB of backup space we use).  In the beginning stages of doing this, I would have a few backup files where the compression step failed.  When this happened, the file was considered still in use and could not be deleted.  To fix, I have to find the SPID and kill it on the OS.

Around the same time, I had to move around some of the data files to different drives, so I was not use to seeing certain data files in certain places.....

Well....I connected to the server one day, saw one of these "locked" compressed backup files.  I wanted to delete is so I killed the process that was using it.  Long story short, it was the actual database log file I was looking at!!!! And, YES I killed the process using it, and deleted the LOG file!!!

Thankfully, I quickly restored it from the recycle bin and I think I stopped/started SQL and all was well! Whew....

Bad day!

August 13, 2008 7:49 PM
 

Denis Gobo said:

Wow, it has been already a year since I wrote A year in review, The 21 + 1 best blog posts on SQLBlog

December 31, 2008 10:37 AM
 

Clive Strong said:

The biggest blunder I saw was by Support for a telecoms company I worked for.  Support had to update a customer phone number.  There was no GUI, so it was a typical UPDATE statement - I think you can guess what's coming?  UPDATE statement, no explicit transactions - a lot of pain of the on call DBA and it just had to happen when I was on call!

Anyways, all customers now had the same phone number!

Fortunately, we had a hacked log shipping process that was used as part of a reporting solution.  I was called about 5 minutes before the logs were pushed to the other database.  Two minutes of sweating and I decided to disable the log ship process and use the other database to undo the change - there were one or two other changes since the last log backup which had to be identified, but it was a lot quicker than restoring an 1TB database and all it's logs!

Needless to say, these tasks were pushed back to the DBA group!

June 26, 2009 11:36 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement