Don't drown in data: how Microsoft's new Azure data services can help

Page 2 of 2:

Elastic data pools and Azure Data Lake

The first new service is elastic data pools which let you share resources between multiple databases – this is a service you can create your own cloud service on, or use to support multiple departments or subsidiaries (imagine creating a service that helps people manage the franchises your business sells).

"Apps that are developed for the cloud, based on scale out, have large numbers of databases that are information repositories with some partitions scheme by which people's data is getting sharded across databases," he explains.

"One things we're seeing again and again is that these different databases and shards see different levels of activity and resource requirements. And that changes over time – sometimes that change is predictable and seasonal, sometimes it's unpredictable and spikey with hotspots of usage. People who run cloud apps are finding themselves in situations where the individual databases are getting unpredictable and harder to manage."

An elastic pool can cope with that, because resources can be automatically reallocated to the databases that need them. "You get a predictable business model even though you have unpredictable databases," Rangarajan explains. "It's the mutual fund principle – it enables them to fund a large number of databases and the pricing structure allows them to use them any way they want."

The elastic data warehouse is a cloud version of a familiar data tool. "It's built on decades of innovation in SQL Server, that's the core engine on which the node engine is running. It has an in-memory column store, all the fancy optimisers we've perfected over years – all the good stuff you expect in an industrial-strength database. We're taking those technologies and combining them with the fundamental elasticity of the cloud. Azure SQL Database is the basis of the service and with the Sterling v12 database engine that has full compatibility with SQL Server."

That means it's the same powerful data warehouse you could set up on your own servers, only much faster to create and to add more capacity to – plus you can pause it when you don't need to run it, and you don't have to keep paying for it. And as well as running your own analysis models, you can connect it to other Microsoft services to look for more insights. "It integrates with Azure Machine Learning and to Power BI for insights," says Rangarajan. "Power BI works directly against it."

Deep data waters

The other new service is Azure Data Lake, which he calls a "global scale, Exabyte-scale data store optimised for analytical workloads." It's ideal for companies who have large amounts of data in different places. "We have data location so they can find out where their data is and move the query to where the data is rather than spending time and money to move the data. You don't need to move data around the world to bring it into one place where you do processing; instead you can shift the queries to the data."

The Data Lake is based on the Hadoop File System so it works with Microsoft's HD Insight service (as well as with any standard Hadoop workload for analysis). "HD Insight will benefit enormously from this; it will just blow through some of the limits that are there in Azure. It's a simple, tactical advantage."

You can store files that are petabytes in size, "but it can also handle large numbers of small writes so you can take sensor data and quickly make it available for real-time analysis," says Rangarajan. "There are times when you will pay more money to use more resources to get the same answer faster," he points out, "and then the elasticity of cloud is valuable. I need to borrow a thousand cores and crunch this data and do predictive analysis about engine failure before the plane leaves on its next flight."

He thinks the Data Lake will also appeal to companies who have requirements for data sovereignty. "Sometimes data movement is restricted because of other constraints than technology – how can you still get the job done when the data is restricted? You can only do it if you are able to move the query. This is ideal for multinational companies with data siloes that cannot leave the country.

"Our intention with this is that your desire to use your data, to get value out of data, should be limited only by the economic concern that the value you get is good for what you spend, not by technological concerns."

Azure isn't just Microsoft's cloud – it's the company's Cloud OS

Current page: Elastic data pools and Azure Data Lake

Prev Page Introduction

Contributor

Mary (Twitter, Google+, website) started her career at Future Publishing, saw the AOL meltdown first hand the first time around when she ran the AOL UK computing channel, and she's been a freelance tech writer for over a decade. She's used every version of Windows and Office released, and every smartphone too, but she's still looking for the perfect tablet. Yes, she really does have USB earrings.