At my office we’re about 90 days into our implementation of System Center Operations Manager for Windows Server and SQL Server monitoring. All in all it’s been a good experience, and I’m really excited to have access to this tool. I’ve logged a fair number of years as a DBA on products like Idera’s SQL Diagnostic Manager and Quest Spotlight on SQL Server Enterprise (and “roll-your-own” solutions) in smaller environments, and liked them, but they always, in my experience, struggled with really large or complex server environments. So far, SCOM shines in that scenario. I have also recently come out of a bad experience with a big enterprise tool (that shall remain nameless) which I am quite happy to see shrinking in my rear-view mirror.
I’m planning to put a few posts here with semi-organized thoughts and comments about starting out with SCOM and building an implementation for anyone else starting down this road. It’s been really fun to dig into.
Required Question: Why buy a monitoring solution?
This is a controversial topic with some DBA’s. I respect those who like to create their own solutions, and those who feel that that activity is a big part of the job of a DBA. I especially respect the need to know what you are looking at regardless of tools – monitoring software is not a substitute for expertise, it helps us to apply expertise. But ultimately my take on this question comes down to opportunity cost. Basically, if you really write your own monitoring solution, you are robbing time and energy from other, more specialized activities that would directly improve your business, because you’re re-writing software that can be purchased off the shelf.
The monitoring software from third-party vendors might not be perfect. Yes, we absolutely need to know what we are doing with SQL Server. And yes, you need to have that uncomfortable conversation with your boss about purchasing something with hard dollars instead of spending time (soft dollars) on development. However, the odds that an individual DBA can write a whole monitoring solution from the ground up, with features comparable to a purchased product, for less true, net cost than a smart software purchase, seem slim to me. You’ll pay anyway, in hours or FTEs.
System Center Operations Manager (SCOM or OpsMan) is Microsoft’s enterprise server monitoring system. It’s a modular, distributed platform that can be configured to monitor practically every Microsoft server product from the OS to SQL Server to IIS, Exchange, and so on, plus can be augmented to monitor other systems with purchased or home grown plug-ins or modules (“management packs” or “mp’s” in SCOM terms). http://www.microsoft.com/systemcenter/en/us/operations-manager.aspx.
Our impetus to get into the SCOM solution at my workplace was about two simple but ambitious desires:
- To have a “single pane of glass” operations monitoring solution, where everyone responsible for servers could see the same events and use a common tool, while not compromising functionality for each specialty. That is, as a tool it should be good at everything from OS to SQL Server to SharePoint or Exchange.
- The failure of the previous software we’d attempted to use for that function.
Ninety days is about the minimum amount of time needed to see if a product like this can really walk the walk, and I have to say I am quite pleased with it.
- One “pane of glass” for almost everyone in IT really works, especially if you’re a Microsoft-centric shop. I don’t have enough real information to talk to non-MS system monitoring, but I can see that Redmond is committed to this tool for their server products.
- Distributed application modeling, which is fantastic.
- Discovery is excellent. All components of SQL Server, not just the engine, with pretty complete information about edition, version, service pack and other server or instance properties.
- If set up correctly, the management of change (adding servers or instances, removing SQL Server components from servers) is great – the discovery piece can automatically find and start (or stop) monitoring new (retired) servers pretty seamlessly. In our department there is basically zero time to set up monitoring for a new SQL Server.
- Scalability is good. We have a mid-sized environment with about 700 windows servers in three domains, of which about 200 have some SQL component installed. We have roughly 175 SQL Server Engine instances hosting 2500 databases. SCOM handles this quantity without so much as a blink, and we have a lot better coverage than we ever had in the past. Pre-SCOM, we’d have to manually set up monitoring for each server, and we could only afford (in licenses and in terms of scalability of the monitoring software) to monitor production. Now we have visibility into every box in both of our datacenters.
As you can tell, I’m pretty happy, but as with anything there are a handful of issues:
- It’s large and fairly complex, using multiple servers. Probably not a quick win for a small shop. Our implementation took two people and a consultant about half-time for a couple of months to do the OS and SQL pieces; other applications like IIS, Exchange, SharePoint will require more hours to set up. This should be expected, though, for a tool like this. Having the consultant turned out to be a very, very good idea. (I have his and his company’s contact information if you want it – email me at merrillaldrich (a) gmail (.) com.) The investment of time was worth it, and gave us a lot more value than the equivalent time spent building something.
- The system itself requires a pretty beefy SQL Server instance to keep up with the load placed on it by a complicated environment.
- The out-of-the-box GUI for things like “dashboard” monitoring overall performance of a particular SQL server isn’t great; it’s fixable with some time investment in creating custom views, and it’s just a GUI/presentation problem, not an issue with the underlying data. Essentially, where another (probably smaller and less enterprisey) SQL monitoring tool will give you a really targeted and well designed performance dashboard with two clicks of setup.exe, this one needs some time to create such a view, and there is a learning curve involved.
Next up – details about how we customized the SQL MP to get to good quality, low noise alerting and performance dashboards.
P.S. If you are a SCOM user and a DBA, I’d love to talk with you at SQLSaturday #65 in Vancouver. Please drop me a line – merrillaldrich (a) gmail (.) com.