Does Your Group Spend Too Much or Too Little?
I haven’t been blogging much this winter, as I’ve been buried under a fairly complicated an ugly upgrade project, which, I’m happy to say, is behind me now. There was one notion that kept coming back to me during that project that I thought I’d share. It seems obvious, but it’s an idea that it seems many IT groups have difficulty discussing or making intentional, informed decisions about: that is the idea of “good enough.” If an IT system is a key part of your business, it’s important to realize, and then make purchasing and staffing decisions on, an accurate notion of what level of quality you really need. Consider this (admittedly crude) diagram:
To the left on the bottom axis we have something approaching the perfectly engineered system: no flaws, perfect redundancy, perfect uptime, infinitely expensive. Near here is where I would expect NASA to want to be for manned spaceflight. Perhaps the engineers of an atomic clock, or, sadly, an atomic bomb. This land is spendy, and hopefully the systems created here are near perfect.
To the right we have junk-ola. Systems that need a huge amount of hand-holding by IT staff, that are prone to break down in both routine and new, unexpected ways. These are the systems that page people all the time (or, worse, don’t page people and should.) These certainly were cheap to put together. Someone probably thought they would save money.
On the y-axis we have cost, higher at the top. The curve represents the planned, expected and visible cost of a project or a piece of IT infrastructure. Approaching perfection gets to be expensive, and the closer you get to perfect the faster the cost rises. This is why, for example, going from zero-nines to two nines in uptime is expensive, but going from that to three and then to four forces cost up dramatically. As we go to the right, we start to buy or build junk, and things look cheaper. Obviously clever people can bend this curve to get better systems for less, which is great, but as a rule this pattern holds true. Fairly obvious, I think.
The Hidden Cost of Junk
Here’s the detail that I find true in many shops, which is much harder for people to talk about:
As you move to the left on the graph, attaining high quality generally means things like planning, testing, development, schedules, manpower, fixes, careful deployment – all of it planned into the project with a cost assigned. The cost goes into budgets and estimates and gets scrutiny and conversation. In some places that also means it gets canceled, trimmed, cut back, edited, sent back to “sharpen the pencil.” Such is the nature of “visible cost.”
As we move to that left boundary, the idea that things will entail a lot of operational handholding and unexpected expense goes down. NASA, for example, would not launch the shuttle (back when NASA did that) and then, once in orbit, have to run out and unexpectedly hire a bunch of expensive consultants to see if they can figure out how to land it. One assumes they had worked that out ahead of time, and spent planned dollars to do so. The dashed line for hidden/unexpected cost should go way down as we build highly-engineered systems.
Over on the right, though, is a familiar place for a lot of businesses: a company buys some system, service or software on the cheap, hopeful about the original price tag, and then finds themselves trapped in a vortex of unexpected costs: to fix it, to keep it running, to manually execute tedious monkey work required due to missing features. In short, they spend a huge number of either man-hours, opportunity cost while the system is down, customer good will or consulting dollars, trying to repair something with an artificially low up-front cost. Most of the time there is also a strange psychology in play where they keep hoping against hope that they will still save money and be able to stop doing that. Often the idea of changing out the system seems prohibitively expensive and difficult.
So, where should we be? In the happy place, if possible:
The happy place is where we didn’t spend too much for extravagant, gold-plated systems that overreach what the company really needed – like a five-nines, always on system for workers that use it from 8 to 5. The happy place also is where we don’t spend thousands of hours (time is money) trying to force some piece of junk to keep running, where it isn’t up to the task, and in the process leak dollars day and night, paying people to do non-productive work that doesn’t move the organization forward.
The happy place is in a different spot based on your business. You might really need to have 24x7 always-on redundant systems. Or perhaps not. But it’s key that the leadership in your team:
- Knows about this graph
- Does not have some strange fantasy about where you really are on this graph
- Has the right strategic vision to put you in the right place on this graph
I am very lucky to have a great job in a place that I think understands this – though every team, I think, has disagreements or times when this is hard to articulate.
Entropy Pushes to the Right
Another powerful sense I had in the last two months is that in IT, when you are an FTE inside a business, systems naturally, relentlessly, slide to the right on this graph. As a DBA or a system administrator, I find that one of the most important strategic roles I can play is always to push things to the left, just to counteract their natural slide to the right.
Here’s what happens: company buys a system that is sort of OK, and kind of meets their needs. But Jim in department X finds this little piece unworkable, and customizes, and Sue in department Y would like a little adjustment over here, then Dave in Ops makes an ad-hoc job for Sheila the VP, and on and on. The intern in HR learns a little
Access Reporting Services and makes a bunch of “useful” widgets. Each small band-aid looks great in isolation, but over time they accrete into this Rube Goldberg machinery around the system. This is practically an irresistible force.
So, what to do? Those people need those changes to be productive. It can really help their jobs and mental state to work around flaws with the software (which are there, really.) The best I can come up with is to try to always make the little tactical moves add up to a strategy that counteracts the inevitable right-ward slide to Junk-dom.
- Always attempt to apply best practices, even in apparently small situations.
- Realize that everything we make is probably permanent. People very rarely go back and fix things, really. If something really does end up temporary, then rejoice in the exception.
- Try to make the tactical moves fit into some larger architecture that has some staying power, and won’t just become a pain to maintain later on.
How about you? When budgeting decisions look baffling, or you can’t get approval to purchase something you need, is it really because your team is moving across this chart? Going the right direction or the wrong direction?