Monitoring Cisco UCS

I am sure I’m opening up a huge can of worms with this post, but all those who know me, will know that I am never one to shy away from controversy or from encouraging debate.

In my role as Subject Matter Expert for Cisco UCS and integrated Systems, I am often asked by customers for my views on how best to monitor them and the applications that run on them. To which my historical response was somthing like “what is it you use now? and if your happy with it, we’ll look to integrate Cisco UCS in to your existing Solution” or perhaps “Just use UCS Manager for the UCS Components and use somthing else for workload and application monitoring.

The fact is Cisco UCS and other converged offerings for that matter have heralded a new age of how workloads are delivered, operated and are managed within a Data Center, across Data Centers and across Cloud Infrastructures whether Private, Public or Hybrid.

And if this is a new age of converged and Cloud offerings, surely we need a new age of monitoring solution for them. Just because a customer or vendor has always done somthing a particular way does not necessarily mean, that they should just carry on in that fashion. Customers deserve and indeed are demanding better!

There’s a lot to be said for having the view of “What good is it being alerted that I have a CPU running hot, or errors increasing on a particular DIMM indicating a possible imminent failure, I want to know what impact that is or could have on my application or service”

This mind-set is ever more relevant now we are in a world of statelessness and Cloud, where workloads can be mobile and running across infrastructure that may or may not be under your own control.

Having this granular and detailed visibility into the Cloud is essential in this new age, of ever-increasing demands to reduce cost and increase efficiency through consolidation and multi-tenancy.

My intention with this post is to have a comparison summary of different monitoring solutions I have installed, tested or played around with.

Now each monitoring product will get a blog post in their own right but as I test a new one I will add it to this comparison summary for a nice quick single point of reference.

Comparing monitoring solutions is always difficult as there are few instances where there can be a genuine “Apples with Apples” comparison. Each generally have their own strengths , weaknesses and focus. Some may monitor the hardware some won’t, some may only monitor the Hosts and not the Guest, some may focus more on the application and so on and so on.

Rather than have numerous sub categories I will list all the solutions I have tested in the same comparison table and score each accordingly, which should make it pretty obvious which solutions fit where.

It is certain that not all of these solutions compete with each other, but there are undoubtably overlaps in many cases, some may compliment each other and intergrate nicley others may not. Which ever way you slice it, it may well be that to get the full solution you want you may require a combination of products.

So all that said the first “Yard Stick” I will be looking at is what do these products give me over and above what I get with UCS Manager and UCS Central, which as I’m sure you will agree are great at monitoring the Cisco UCS hardware, components and configuration.

So as a starter for 10 I have listed my own view on where UCS Manager (including the added functionality of UCS Central) sits as the first column to which all future solutions can be “Compared and Contrasted”

The second is how much functionality I get “out of the box” now I’m no developer, Script or API guru so while I have seen some immense bespoke monitoring solutions fronted by cool bespoke Apps, I have neither the time nor skill to go to that level with my testing. I want to click install and then start getting some cool useful infomation, OK perhaps that’s a bit unrealistic but you get the point.

I will also look at Cost and licensing, for me its a simple equation, Cost should = Value, if I get a lot of value from a product over that of UCSM and UCS Central, then I look at the cost and if that cost is reasonable for the value I get, then in my book that’s a viable product.

I’m sure most of these products will come with reams of info on how it reduces TCO and the ROI it promises, usually by reducing troubleshooting time or identifying an issue before it becomes an issue (Proaction rather than Reaction), the engineer in me has never cared much for marketing but rather just on the facts and tangible results.

I will also look at how these products may aid with compliance, to either internal or regulatory policies and standards. Like PCI DSS

So stay tuned, hoping to write-up the review of the first product over the next week or so.

I have posted the below spreadsheet with the solutions and categories that have come to mind for testing and scoring, so that the community has a chance to give me their comments on them and suggest additions / alterations prior to the testing.

Not sure on the timescales on this as I am kept very busy with my day job, but every so often I get some Lab time to do some testing, or even better a booking to design / install one of the products I am evaluating.

So I see this as an ongoing project.

And as always this is just my view,  and my testing, no Vendors have sponsored any of these tests, or will influence any of my opinions and I will try and minimise any harm to animals juring my research.

UCS Monitoring Comparions

UCS Monitoring Comparions

Advertisement

About ucsguru

Principal Consultant and Data Center Subject Matter Expert. I do not work or speak for Cisco or any other vendor.
This entry was posted in Monitoring and tagged , , , , , , , , , . Bookmark the permalink.

17 Responses to Monitoring Cisco UCS

  1. Dmitri K says:

    Do you know much about BladeLogic? Would be very interested to see how that stacks up, too.

    • ucsguru says:

      Hi Dimitri
      Yes, BMC is actually our primary method for monitoring our managed service customers, although I personally come across Blade Logic more for its Automation / Orchestration capabilities.

      And Automation / Orchestration solutions will certainly be a future blog topic for me; I am just waiting to see what merged “Super Product” comes out of the amalgamation of Cloupia and Cisco IAC.

      But I’ll have a chat with our Blade Logic / Patrol guys and see if adding it to this list would be a quick win.

      Thanks for getting involved

      Regards
      Colin

  2. Jason Daniels says:

    Hi Colin … Bladelogic is indeed BMC’s Server & Network automation toolset, it is NOT a monitoring tool. BMC Patrol, or ProactivNet is BMC’s monitoring toolset used in conjunction with BMC Event Manager and Service Impact Manager. BMC has identified converged infrastructure as an up and coming technology and have a dedicated set of monitoring knowledge modules available which give Patrol the ability to monitor not just a device as a device or a server as a server … but monitor and report as a converged service. As you can appreciate monitoring is a complex beast … at a technology level it tells us if we have a problem with server x … but more importantly it needs to tell us if the customer’s service is impacted and to what level. We need to move away from technology events and into service impact events, of which the technology event will feed into. Of course we care that a node in my cluster is moaning about something, but we care even more if that moaning node has just off lined a whole service that a customer pays for, and the only intelligent way of doing this is via Service impact management and streamlined/noise reduced technology event management. Good luck with the monitoring testing!, I would say you need to add BMC Patrol to your list and maybe look somehow at how your Cisco kit plays into the service monitoring .. so x + y down will give you a working degraded service, but x + y + g for example will totally offline you. Appreciate your testing the technical bit at the moment, but I believe there’s a lot of Gold in the service impact element (Im not a CISCO expert so apologies, but I am an enterprise toolset expert, Came from Logica Outsourcing services where I headed up Enterprise Tools & Automation, and now work for Computacenter consultancy) – Jas

    • ucsguru says:

      Hi Jason
      Thanks for claryfing, I’ll certainly give you a call to discuss testing a BMC solution, as I’m sure we must have a tonne of collateral already.
      Great to know we have people like you on board I’ll certainly be in touch.
      Colin

  3. ghusson says:

    Hello,
    Maybe you can consider Zabbix. It is powerfull but requires a learning time.

  4. Bob H. says:

    Colin,
    A very interesting topic for me and kind of sad that I am just seeing this now. Very rarely will you every find ‘one tool to monitor it all’. Cisco is great at exposing all sorts of SNMP OIDs for their equipment for polling or traps. This is especially true for UCS. But sorting through and stitching together all the ‘link down’ and ‘virtual interface’ messages to systems and applications as well as other traps drives me crazy(er). Also, you can never get any one group to agree on a ‘one monitor to rule them all’. You end up with a storage resource monitor, a hardware infrastructure, an application monitor (if you are lucky) and so on, set of element tools. No one of these tools is the best (or should be). Monitoring tools should also be able to utilize (ingest or perform) performance thresholds. All of these tools take an enormous investment of time. Many of these tools you listed above fail at the sheer volume of events generated from the scope you listed above especially when you try to make sense of the data. Anyway to put my rambling to an end, please take a look at the BRKCDN-5044 session from Cisco Live London (2013). We have used Netcool for many years now to break down Cisco events. Tight integration between Cisco and IBM with the NcKL (Netcool/OMNIbus Knowledge Library) really helps make sense of all of the data/metrics and the rest of your Cisco environment.

    • ucsguru says:

      Thanks for the comment Bob
      And I totally agree, There is very rarely a one size fits all solution. My aim with with edeavor is two fold, One: to try and show where the strengths, weakenses and overlaps are of these solutions. And Two: to try and find a Private Cloud Monitoring layer for the Converged Infrastucture and Intergrated Systems I design. I fully expect this layer will not consist of a single product but more likley a combination of two.
      I have just finished all the CLEUR sessions I was not able to attend in person on Cisco Live 365 but I will certainly have a look at BRKCDN-5044.

      As mentioned this will be a bit of an ongoing project as finding the time to put these products through there paces is always a challenge. I am almost finished my first review (ScienceLogic) and hope to have the results up in the next week or 2.

      Thanks for the input.
      Regards
      Colin

  5. Craig says:

    Colin, Have you made any progress with your analysis? The chart i see only has UCSM results.

    • ucsguru says:

      Hi Craig
      Thanks for kicking my butt on this 🙂
      I have been so busy on project work, that I have had very little time to progress this. That said I have evaluated ScienceLogic and have the results pretty much written up now. So should be up soon.

      Regards
      Colin

  6. Charlie says:

    One thing missing from your list is MS System Center Operations Manager. There’s a specific management pack for Cisco UCS, which we’ve been using productively for a few months.

    It’s worth a look…..

  7. Make sure you get a demo of Zenoss Enterprise with the UCS integrations.

  8. friea says:

    Hiya Colin – would be very interested in chatting at some point and get your top-of-mind thoughts on Splunk; might be able to streamline your process by putting you in touch with the UCS app developer for a demo & direct questions if useful.

    • ucsguru says:

      Hi Friea
      Hope you are well, Splunk certainly is on my list of products to evaluate, we’ve used it very successfully on troupleshooting a non Cisco UCS issue in the past.

      I’ve spoken to Hal Rottenberg about getting somthing together, just haven’t had the time as yet. But hoping to have some soon.

      Colin

  9. Dani says:

    Hello Colin,
    Will you update the table soon? I only see UCSM tests….Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.