Partner Article

Avoiding the downtime blame game

*By Mike Kelly, Chief Technology Officer, Blue Medora and General Manager, SelectStar *

No matter which sector an organisation finds itself in, one key aim of the IT team will always be same – maintain uptime. Over the years, technology has taken over more of what businesses need to survive. According to a study by analyst firm IHS Markit the financial impact of downtime is $700 BN per year. Without access to business-critical applications, there is no business.

However, no matter how hard they try, IT teams cannot always keep systems up and running. One of the most notable examples of downtime occurred in 2016 when United Airlines grounded all of its U.S. flights after an IT error. Outages such as this inflict massive damage to a company’s finances and reputation. The longer they last, the worse things get.

To fix the problem, you first have to find it, which can often difficult for IT teams. Larger enterprises often operate with an array of disparate hardware and software that carry a patchwork of vendor-specific tooling to monitor and manage each system. This collection of tools creates additional silos in the IT stack. It takes multiple IT teams to manage these tools, creating an environment where technicians step over each to solve the same problem. Conversely, a single technician might be assigned to unravel a number of disparate reports sent from several internal and external support teams. It is at this moment when the fingerpointing ensues. Blame is assigned to different tools managed by different teams.

To end this confusion and inevitable downtime, IT teams must move to a centralised environment with a single set of tools and reporting process. This will allow them to view information in one place so IT teams can sing off the same hymn sheet. This centralised approach provides several advantages.

• Greater visibility – Having the full stack visible in a single pain of glass helps teams monitor the full IT environment from storage to compute to virtualisation and more. This enhanced visibility makes easier and faster to identify and resolve issues and avoid downtime

• Early warning signs – A single solution presents analytics and alerts from the entire IT environment, offering a better alternative to multiple tools that lack the context of the wider stack. This can allow teams to be presented with issues before they escalate.

• Risk analysis - Calculate risks through IT layers with specific downtime scenarios, such as a datastore running out of storage and causing an App/VM outage.

The Internet of Things will continue to drive massive amounts of data that will make databases and IT infrastructures significantly more complex and harder to manage. With this in mind, it is important for IT teams to centralise and consolidate IT management tools and teams to proactively avoid preventable issues. Implementing a comprehensive solution and eliminating disparate systems and tools will enable the birds-eye view of that is crucial to ending the downtime blame game that occurs in so many organisations.

This was posted in Bdaily's Members' News section by Blue Medora .