Azure isn't just Microsoft's cloud – it's the company's Cloud OS

With Azure, you get cloud-scale – and tools that are cloud-scale

Azure vice president Jason Zander

From supporting Docker to being able to run key applications like Oracle and SAP, to a host of services and SDKs that let you do everything from backup, through search, to API management, mobile notifications and analytics, and even running your own streaming media service, Azure is increasingly looking like more than the average public cloud.

Not only does it mix features from both Platform and Infrastructure as a Service cloud architectures, but there's a sense in which you can see it as a platform in itself. Tools and services that software vendors would once have built only for Windows Server are showing up on Azure these days.

Cloud OS

That, as much as the ability to build your own private cloud, is why Microsoft talks about its 'cloud OS', Azure vice president Jason Zander explained to us. "That's the original idea around the whole moniker of Cloud OS. Some of this is conceptual ideas we used originally, like an OS has a storage subsystem and it has a compute side for executing applications – and the cloud has all those same sorts of pieces as well."

But the demands of running applications in the cloud, where they work best if they're built as a service or, increasingly, microservices – where instead of a single giant service that runs (and fails) altogether you have a collection of smaller services connected together by APIs, so you can change, update or restart the separate pieces – means that the tools you need to use this 'cloud OS' are very much like the tools Microsoft itself has developed to build and run Azure. More and more, the new services coming out on Azure offer those same tools to customers.

"Some of the technical pieces are getting consistency right," Zander explains. "Load balancing, where are my trade-offs around consistency, availability, all the normal pieces that are required. And if you can write some software that can maintain high consistency across that high availability and get decent levels of performance, that's pretty amazing. Some of the stuff we've got in the core [of Azure] does do that.

"The next version of the control plane we have in Azure is built on some of this technology we've been building for a very long time. It's been at the base of things like our Azure DB, our database as a service; that is the underlying support system that drives that and keeps it highly available."

Growing sophistication

"The sophistication level of what people are trying to solve is getting higher and higher," says Zander. Businesses might start out by putting virtual machines in the cloud, but they realise they need more. "That's great but it doesn't really cut it for scale-out software. Now you have to start doing it at scale with availability. You need the underlying infrastructure to give you that and then a component on top to orchestrate, especially microservices for scale-out. Trying to figure out how do you keep a highly available environment with regional reliability – and at that point you're starting to get into the distributed algorithms."

That's a lot more than the high availability you might be used to with databases, he explains. "A database typically will have some kind of failover replication, like the witness idea. But when we run our control plane [for Azure], we'll run seven to nine instances of critical software." Those instances all run on different hardware inside Azure.

He continues: "So that even if I have hardware failure or power failure I can keep it up and running. And then you start thinking about 'How do I replicate state in such a complicated environment, how do I maintain quorum?' Then the next question is 'How do you handle upgrades?' If I want to move new versions of the software through, how does that actually work?"

Those are problems Microsoft has already solved for Azure and can now offer to customers. "We did Azure Batch in particular, because we found people trying to go in and write these kinds of engines and the truth is they're difficult to get right, especially by the time you factor in some fault tolerance in the ability to restart and review merges.

"Conceptually it seems simple – it's just the devil is in the details. We took some of the same things we use – we use it for grid encoding for Azure Media Services, we have customers in the insurance and actuarial business that use it as well – and turned them into a service."