Scaling the infrastructure in times of COVID-19

Scaling the infrastructure in times of COVID-19
(Image credit: Shutterstock / carlos castilla)

The COVID-19 crisis has brought many challenges and changes for IT management teams. There’s the immediate business challenges of keeping employees connected and productive – and supporting remote working over a VPN designed to handle a tiny fraction of the workforce, for example.

Then there are the wider commercial challenges of serving customers digitally at an unprecedented scale. Supermarkets, for example, are facing exponential growth in online ordering. If there was ever a time when you needed a real-time view of store inventory to support click and collect, now is that time.

Finally, there are the challenges of quickly delivering new digital experiences to support navigating the crisis itself: governments and healthcare teams need a real-time view of ICU beds and ventilator availability, contact-tracing to lift shielding orders and assess cases, and financial institutions supporting a range of customers through the economic fallout.

Scaling existing systems is hard. Scaling existing systems while also introducing new applications that need to be imminently scaled nationwide can feel impossible. Cloud-native patterns, practices, and technologies can help. But for most organizations those patterns, practices and technologies aren't pervasive yet.

So, what can you do, in this moment of crisis? After all, architectures can't take a U-turn. But as we've seen throughout history, human behaviors can, especially in times of crisis. In order to scale and get to production faster, both developers and ops managers must make their respective contributions to even closer and more efficient cooperation, now and in the future.

Hurdles that often stand in the way of closer cooperation

Before exploring ways developers and operators can help each other, we need to understand what currently gets in the way of scaling and speed to market. The largest, most systemic reasons often come down to a mismatch of goals and objectives and a mismatch of dev and prod environments.

Indeed, operations teams are often aligned and measured around cost and stability. But while important, these priorities don't mesh well with change, such as scaling and developing new features or applications. On the other hand, developers are often more aligned to business goals, and often measure their progress against velocity. In other words, they are all about change – leaving stability an afterthought.

Moreover, different IT infrastructure and tools are used in production versus development, you can easily end up with a language barrier between dev and ops. This slows transitions into production because new issues crop up in new environments, and can also slow troubleshooting during incidents, leading to longer mean time to resolution (MTTR).

So, how can developers help ops teams?

Ops teams carry the responsibility for uptime, and have historically spent a lot of time making sure that infrastructure is available. In a cloud-native world, this gets turned on its head, as infrastructure is assumed to be unreliable. But ops teams can't do much to the code itself. Developers, however, can make their code more uptime friendly and scalable.

One of the ways they can do so is by ensuring the "observability" of the code. Instrumenting your code from the get go pays off in the long run. Spring has made this relatively easy with Spring Boot Actuators and Micrometer – as the faster the reasons for downtime are identified, the higher the uptime. Using common tooling for monitoring and observability also helps bridge the dev/prod mismatch.

Moreover, by ensuring applications can be restarted easily and rapidly, the more resilient ops teams can be to failures – and the more likely they will be able to take advantage of powerful automation. This is a fundamental assumption in using a system like Kubernetes, which automates starts and restarts in an infinitely reconciling loop.

By thinking of how to make code more uptime friendly and scalable, developers help ops teams with their goals and objectives. This helps to bridge that mismatch that often occurs between these groups. It also helps deliver a better customer experience, particularly during this crisis when customers depend more on digital services.

And how can ops teams help developers?

If developers can help ops teams by taking on some of their concerns, the same goes for operators helping developers. By aiming to speed the path to production, ops teams can look to reduce the waiting time between identifying and raising a problem, so that the necessary code can go into production faster.

Additionally, ops teams can get a better sense of what would help developers by spending more time with those teams. Scheduling office hours, dropping in on dev team meetings, and asking developers for feedback are some quick changes to start conversations that can lead to more effective collaboration.

Finally, relationships can be boosted by finding ways to say ‘yes‘ to developer teams. While ‘no‘ is an easier word to say, rooted in a need to deal with chaos and complexity, it burns bridges faster than fire. When developer requests come in, ops teams should aim to slow down and try to find a way to say yes. Getting to the root of why something is asked for is a helpful way to identify ways to say "yes."

Closer cooperation for faster, more agile responses

For companies that have been on the path to cloud-native, the pandemic has proved out the difficult changes they've been investing in. From launching new applications in days, to scaling 10x without breaking a sweat, these patterns, practices, and technologies are paying off. They can't be turned on overnight, but there are small steps that individuals can take to make scaling, uptime, and speed to market more seamless. COVID-19's challenges act as a catalyst for what is already required and will be even more required in the future: closer cooperation between developers and ops managers.

  • Ed Hoppitt, Director EMEA, Apps and Cloud Native Platforms, VMware.

As Director EMEA and member of VMware's Technical Services leadership team, Ed Hoppitt focuses on driving DevOps agenda to streamline how applications are evolved and managed.