I do a fair amount of operations work in my DBA role, so obviously automation is one of my favorite topics. In fact, I get a little twitchy when I see people plowing through repetitive, rote, manual processes because, to me, those people could instead be doing more creative, satisfying and useful work. Still, that doesn’t mean we ought to stumble ahead attempting to automate everything.
| ||This post is for T-SQL Tuesday #050: Automation, how much of it is the same? hosted by SqlChow ( http://sqlchow.wordpress.com/ | @SqlChow ) |
From his blog:
”T-SQL Tuesday is a monthly blog party hosted by a different blogger each month. This blog party was started by Adam Machanic ( blog | @adammachanic ). You can take part by posting your own participating post that fits the topic of the month and follows the requirements below. Additionally, if you are interested in hosting a future T-SQL Tuesday, contact Adam Machanic on his blog.”
While I like automation, I love well thought-out, clear, reliable automation. I always ask myself a few questions when trying to create an automated solution - but I’ve never formalized them into a list that could be shared. That got me thinking: What is it that makes for really good automated solutions? Is there some checklist that could be applied? Here’s what goes through my head.
First ask, “Should we do this?” It seems obvious, but there are good and bad candidates for automation. Next, assuming the process at hand is conducive, spend a little quality time on “How should we do this?” There are myriad ways to automate different systems, and I’ve seen some pretty spectacular hammer-the-screw solutions, as I’m sure you have. In real teams there is often a variety of skills and preferences about tools, and it’s important to be objective about picking the right tool over the simply expedient or familiar one. Pick a technology that fits both the task and the organization. Do something expected, so you don’t set a boobie trap for others.
Next, once the solution is coded up, it’s worth looking at with a critical eye. Is this any good? Does it really fit the original problem? Will it run reliably?
Lastly, ask whether your teammates or even those who come after you will appreciate that you created a helpful solution, or suffer under it as it breaks down or causes problems.
One theme underlying all these small decision is to create causation – make things happen for a clear reason and make one action follow the next in a clear cause/effect, predictable way. This goes for the “meta” decisions about whether and how to automate, and also for the details about how your solution is coded.
So, here’s a short worksheet with some talking points that you can use individually or with your team to improve the effectiveness of automation.
The first question to ask is whether it’s worthwhile to automate a process. I find that by the time you are even asking this question the answer is probably “yes,” but it’s worth pausing to consider. Is it in the business’s interest to automate? Among the thousands of things you could work on, does this one help the organization? A simple example: If I have to restore one database in an ad-hoc fashion, I don’t automate it. I might even (gasp) use the GUI. Why? Because there’s no benefit from the time invested. On the other hand, if my team has to regularly restore from backups for testing, then absolutely, we automate that.
Is the manual process conducive to automation?
Is the process repetitive?
How much brain power is involved in performing the existing task? Is there risk of errors from the inconsistency of people performing the work, or, conversely, are there so many choices in the process that automation would not provide the same quality of outcome?
Will it pay off to invest the extra time to write something production-ready? Anyone can dash off a script in a few minutes, but that’s not really production-ready code that is good enough to drop into an automated solution.
Will having a machine do this work free up a person to do more useful work?
First, an anecdote: I once worked with a guy who was a *nix admin, and we had some automation we needed for basic OS tasks, like move a file from here to there, etc. He didn’t know PowerShell, so he went to the Windows server involved, installed Cygwin and wrote a shell script. This is a smart guy – he would not have been able to do that otherwise – but that was the wrong answer. Why? Because we were a 99% Windows shop, and he created something he personally would have to support forever. It would be just as wrong to go into a Linux shop and use PowerShell to script stuff. It’s not because the technology is better or worse, it has to do with the context of available people, skills, systems and the task.
Have you picked a tool that fits the task? I find that the more complicated the code has to be, and the more you struggle to write it, the more likely it is that the tool is the wrong one. It could also be that you need to learn the tool better, but assuming basic skills, in today’s world most things we face day to day are common problems and so the right tool will feel like an easy fit.
Are you reinventing the wheel? If you are working really hard, pause and think, “There has to be a code library for this.” Use the available frameworks to plug together existing functionality, because almost everything has been done already by someone else.
Is this method supportable in my organization? Is it something the other staff will expect as a reasonable solution? Or will they say, I don’t understand that / I’ve never seen this before / This tool is totally unfamiliar, or, worst, You did what?” Consider whether the design follows naturally from the problem, or is forced and bent to solve it.
What about Qualtiy?*
Once we’ve coded something up that we think fits, it’s still worthwhile to look at it with a critical eye. This can be tough, psychologically, for some folks because, naturally, we all have some pride in the things we produce. When I was in Architecture school, one of the most painful but also useful activities was to constantly have to hang the work up on the wall and have people critique it, sometimes ruthlessly. This process was humbling but it definitely creates a mindset where you can be objective about your work.
Is it finished? The first draft of one of these solutions probably isn’t the real answer. Think carefully before pronouncing the design “done.”
Is it brittle?
Is it a landmine? A landmine is something that will sit there working away for some time, but then explode, out of nowhere, all over staff that doesn’t know it’s there or how it works.
Is it a Rube Goldberg Machine? Are there too many moving parts? I like to keep a Rube Goldberg Machine illustration at my desk, just as a reminder.
Pretend you are a person who has never seen this before. When designing this did you use a tool that was expected, in the way it was designed?
List the ways this automation will fail, and what happens for each type of failure. What on the list is preventable?
Are there race conditions? Invalid assumptions? One of my favorites: “At the start of this script we assume that file ‘x’ is available in this location at 10:00.” That might be true sometimes – but where does said file come from? What happens on the day it’s not there? Late? Is it possible, or better, to create a cause and effect relationship where after a source system successfully delivers a file, the next step in the process is kicked off?
Did you test it? “Happy path” testing, or real testing?
Will the People After Us Hate Our Memory and Curse Us Forever?
Lastly, I find that it’s incredibly useful to consider what people will think of this solution later. While you can’t predict the future, and not every choice can be perfect, if there’s an automated system that is constantly breaking down, or that nobody understands, it’s worth paying attention. Why is it so awful? Beyond the specific details of what is broken, are there patterns and practices that were not followed that would have prevented the situation?
Can your automation recover from an error? How big does the error have to be to require manual intervention?
Does it tell anyone when there’s a failure? I have tried to work with a commercial file transfer system that didn’t notify anyone or retry when a file failed to copy from one location to another. You can imagine how that goes, in the middle of the night, on call, etc.
Lastly, does the system plug in neatly to a modular, Enterprise pattern of scheduling and/or monitoring and notification? No one is happy in a large organization with tens or hundreds of little scripts, one that sends an email, one in VBScript hiding in Windows Scheduled Tasks, one that logs to a file, one that logs to the event log, one in SQL Agent, one in Orchestrator, one in SSIS, one in Oracle Enterprise Manager. Over time, standardize how automation is driven with a rational scheduling system, and how notification works in case of failures. In the real world perfection probably is unrealistic here, but having standard patterns helps tremendously.
True story: we had a SQL Agent job that called a VBScript that purged files from a share to make up for a badly designed web service that needlessly cached thousands of files to disk. This turned out to be a classic landmine: It worked, but nobody expected it to be there, doing what it was doing, because it just didn’t make sense. After a server move, the files filled that cache and brought the associated system down. Poor automation.
One More Thing
I’ve been careful not to point out any specific language or system or tech in this post, because I think these questions apply across all Ops-type technologies. Whatever environment you work in can enable high quality solutions if you pay attention to these issues.
That said, Hemanth asked in the T-SQL Tuesday invitation what tools we like for automation, and, I have to say PowerShell is the best thing that has happened to Windows in the Enterprise in a long, long time. It makes practically everything on this list better, at least in a Windows world. PowerShell FTW!
* Yes, misspelled.