Troubleshooting Is A Team Sport Automation That Promotes…
by Sep 10, 2019
Ed. note: The following transcript has been drawn from the on-demand recording — no registration needed or form to fill out — of NetBrain’s Just in Time Automation for IT Operations webinar. Jason Baudreau, NetBrain VP of Marketing, is your host.
Let’s talk about how automation can help as network teams collaborate with incident response.
When troubleshooting or escalation is necessary, there are a lot of inefficiencies and a lack of tools geared towards collaboration. Engineers are often duplicating efforts because they’re just not on the same page or they’re not aware of what the other team member is doing. And the other thing we see is finger-pointing between teams. That’s not uncommon at all — whether it’s the application team and the network team, the security team, the server team — there’s always a lot of finger-pointing about whose problem it is.
If you look at mean time to repair (MTTR), it’s really a result of two things: what we call MTTI, which is I think the bulk of the challenge, with the repair being less time-intensive. When I say MTTI, I’m actually referring to two things:
The mean time to identify a problem. In the context of collaboration, I’m going to talk about this in terms of escalation and handoff.
But also MTTI can be mean time to innocence. We know the network is guilty until proven innocent. Unfortunately, this challenge falls on the network team to prove that innocence to other teams – the app teams, server teams, for example. And that’s not always easy.
When I talk to engineers, this is a familiar challenge. . . .Do other teams assume that every application slowness issue is really a network problem? What percentage of the time is it really the network?
Let’s look at how automation can address these two challenges. The answer we came up with is to automatically document user activities inside a runbook. The runbook is embedded within the map URL so that everyone can see what their colleagues are doing as they troubleshoot alongside them. Perhaps this is used for escalation, so that the Tier-2 engineer can see what has already been performed by Tier 1. Basically, a network map can help everyone understand who did what when, and what was the result. Again, get everybody on the same page.
We use cookies to personalize content and understand your use of the
website in order to improve user experience. By using our website you consent to all cookies in accordance
with our privacy policy.