Spark meets Cassandra: what it means for big data analytics

DataStax explains new partnership

Transactions analytics and making the customer experience more dynamic

Big data analytics now represents a fundamental part of the modern enterprise's business strategy, with the opportunity of uncovering previously inaccessible data becoming too valuable to pass up.

As such, new developments in the analytics industry are cause for interest. So with DataStax announcing a new partnership with Databricks, marrying the popular Spark and Cassandra platforms, we wanted to find out what it meant for the relevant user communities.

DataStax VP John Glendenning filled us in.

Article continues below

TechRadar Pro: So what are the details behind DataStax's recent announcement?

John Glendenning: We've announced a partnership with Databricks - the company founded by the creators of Apache Spark. It's the database industry's first partnership to integrate Spark and Cassandra.

Between us, we will deliver significantly faster analytics to users of both open source technologies and enable today's most progressive businesses to deliver highly personalised online customer experiences.

From the developers that we have spoken to, they are excited at the opportunities that they see around these two tools. Developers are always looking at what they can do that is new and exciting, and this partnership will help them to explore this cutting edge.

TRP: How does this impact the operational database industry?

JG: Through this partnership, we are driving the operational database industry toward a better approach that allows companies to ingest user data at a very fast rate, and then analyse the results within the same distributed database.

Responsiveness to customer needs is critical for successful online businesses, and by decreasing their "time to insights", companies can create highly personalised experiences for their customers.

Ooyala, a video analytics provider, has been running Spark in parallel with Cassandra for some time.

TRP: What will be the biggest benefits of this partnership?

JG: Let's go into Cassandra and Spark in a bit more detail. Apache Cassandra is a fully distributed, highly scalable database that allows users to create online applications that are "always on" and can process large amounts of data in real time.

Apache Spark is a processing engine that enables applications in Hadoop clusters to run up to 100 times faster in memory, and even 10 times faster when running on disk. Spark also provides SQL, streaming data, machine learning, and graph computation functionality out-of-the-box.

This makes it easier to build end-to-end analytic workflows. Together, these technologies can significantly boost analytics performance in a transactional database and allow companies to act quicker when serving their customers' needs.

When you are looking at providing recommendations to users or personalising an online experience for each customer, using Spark and Cassandra together makes a lot of sense.

TRP: Are you seeing a change in how companies value and approach data in 2014?

JG: Not so much of a change, but more of an increase. Modern enterprises use data as a strategic asset to compete. Companies are moving towards more "near term" analytics that can provide data insights in real time, so they can respond faster to opportunities.

Because of this, online applications that interact with customers and collect data must remain online all the time. They must be capable of reaching and interacting with customer data no matter where they are located. When companies are looking at improving the customer experience with personalisation, this is a key use case.

TRP: Cloudera, a main Hadoop player has partnered with Databricks and now DataStax is as well. So are you guys also a Hadoop vendor?

JG: DataStax is not a Hadoop vendor, but instead we are focused on serving the database requirements of modern online applications.

These applications have the need to run both analytics and search on their online data (line of Business systems and data warehouses), so that functionality needs to be present on the NoSQL side of the house as well as the Hadoop data warehouse side.

We allow for that by integrating analytics and search technologies that function across a distributed shared nothing architecture such as ours.

TRP: What does this mean for the Cassandra Community?

JG: The Cassandra community is growing quickly, with global user meetups increasing 400 percent over the past year. Spark was coming up as a frequent topic of discussion.

DataStax employees already contribute the majority of the Cassandra open source code contributions. By working closely with Databricks engineers we will now contribute to the Spark community as well.

The partnership will help spread adoption of both technologies while creating greater cohesiveness among users.

TRP: Can you see either party forming any similar partnerships, ie. with Hadoop?

JG: I don't have any details from a DataStax perspective, but will be sure to keep you updated on any developments in the future.