The Ghosts of Network Operations Past, Present and Yet to Come

A digital representation of a lock
(Image credit: Altalex)

In a Christmas Carol, the protagonist is visited by three ghosts that show him the error of his ways, while also offering an opportunity to course-correct.

Network cybersecurity teams might find themselves in a similar pattern of reflection as the year winds down and we look ahead.

After the year that was, some may well be haunted by the ghosts of past outages, as well as by the ghost of present-state architectural affairs.

In that context, a visit by the Ghost of Network Operations Yet to Come could be a welcome one, particularly where it crystallizes a path forward to improve performance and resiliency.

Over to you, ghosts.

The Ghost of Network Operations Past

The first ghostly visit channels flickering memories: of network failures with cascading impacts, and manual interventions.

Ops teams need not cast their mind back too long to recall issues caused by lack of resilience in network routes, where a single link failure caused a domino effect for anyone downstream or with some sort of interconnection or reliance on that connectivity. Engineers are likely haunted by these types of incidents: the blast radius was well beyond any conceivable comfort threshold.

Mike Hicks

Principal Solutions Analyst at Cisco ThousandEyes.

Another memory flickers into view: of the manual intervention required by affected organizations to respond. When systems went down and the organization lost connectivity, the troubleshooting immediately kicked in: what was the root cause? How could connectivity be revived? Should we manually switch over to a backup link, re-advertise our routes and start diverting traffic to get back online, knowing that could take many hours?

It’s a pertinent reminder of where providers and customer organizations alike would not want to find themselves again.

The Ghost of Network Operations Present

The arrival of the Ghost of Network Operations Present takes engineers to view the good places some organizations have reached today, with in-built resilience, redundant routes and some automation enabled by the shift to software-defined networks.

Observing other organizations, it’s clear that software-defined networking is making teams happier. There’s no physical shutdown of a network device for upgrades or maintenance; the impact of changes is being tested non-destructively beforehand to understand what effect it’s going to have before being applied to the production environment. Teams are dealing with noticeably less ‘unknowns’ in the upgrade processes and they have a clearer understanding of the environment they’re going to impact by making a change.

But it’s quickly apparent that this visibility has a limitation, stretching only as far as what is hosted on their own domain. On that note, the Ghost of Network Operations Present serves one of its purposes - reminding that the present is constantly moving, and that with enough passage of time, the present becomes the past.

At this time, there’s a natural inflection point in the ghostly encounter, and the question forms in the mind of the network ops engineer: what does the future hold?

The view then cuts to an organization that relies heavily on hosting environments outside of its immediate controlled domain, and therefore beyond the reach of its software-defined capabilities. Applications are instead hosted in clouds, and while the cloud infrastructure is resilient, the applications themselves appear brittle and prone to performance degradations that lead to critical functions suddenly becoming unavailable.

Cut to an image of frantic engineers as payments suddenly stop completing; then of workplace communications channels failing; and of workers asking applications for data and receiving nothing back.

Make it stop.

The Ghost of Network Operations Yet to Come

If the future direction isn’t clear, the third apparition makes it so. There is a need to address the root cause of application outages and instability, by extending visibility out from the domain of command to cover all domains that host components of the end-to-end application architecture.

There needs to be oversight of the complex orchestration of components that enable an application to function. It’s only by mapping out that full service delivery chain that single points of dependence can be identified - the same single points that are causing different functionality within the application to become temporarily inaccessible.

With that clarity, a path forward emerges: Network Assurance. This focuses on the end-to-end “network” of interconnected private environments, service providers, and services that form a user’s experience of an application or service—from Internet service providers (ISPs), public cloud, SaaS, and more. It provides a holistic view of digital experience by showing each connected element, such as a router, DNS resolver, or web server, and its impact in relation to other elements and performance as a whole.

But end-to-end visibility alone is not enough: there is also a requirement to be able to use telemetry and insights as the basis to initiate affirmative - automated - action. Being able to understand what parts of that monitoring and recovery process can and can’t be automated is important to continuing to be future-focused. It also brings closed-loop remediation into view: the idea that systems exist to recognize the makings of an application outage, and that automatically initiate an escalation and fix, without human intervention.

We've featured the best network monitoring tools.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Mike Hicks is Principal Solutions Analyst for Cisco ThousandEyes.