CI Test coverage(%) Code quality Stable version ScalaDoc Chat Open issues Average issue resolution time
Build Status Coverage Status Codacy Rating Maven Central ScalaDoc Gitter Percentage of issues still open Average time to resolve an issue

Phantom vs other Cassandra drivers

This document aims to offer you an insight into the available tooling for Scala/Cassandra integrations and hopefully help you pick the right tool for the job.

The available offering:

Feature comparison table

Let’s first compare the basic qualities of these available drivers, looking at a wider range of features and verticals, from how deeply integrated they are with Cassandra and the support they offer, to their level of activity and how up to date they are.

Driver Language Commercial Type-safe Schema Safe Spark Support Streams DSL Cassandra Latest Activity Since
Java Driver Java [x] [-] [-] [-] [-] EDSL Latest 3.1.0 High 2012
Phantom Scala [x] [x] [x] [x] [x] EDSL Latest 3.1.0 High 2013
Quill Scala [-] [x] [-] [x] [-] QDSL Latest 3.8.0 High 2015
Spark Connector Scala [x] [x] [x] [-] [-] EDSL 3.0 3.0.0 High 2014

An overview of the various drivers and using them from Scala

Datastax Java Driver

Created by Datastax, the commercial company behind Cassandra, this is the underlying engine of all other drivers. Phantom, Quill and the Spark connector all use it underneath the hood to connect and execute queries. Phantom and Quill add a Scala friendly face to the driver, while the Spark connector does what it says on the tin, namely to focus on Cassandra - Spark integration.

So why not just use the Datastax driver for Scala? We would like to start by saying it’s a really really good tool and some seriously good engineering lies behind it, but by sole virtue of being aimed at a Java audience, it does miss out a lot on some of the really powerful things you can do with the Scala compiler.

Cons of using the Java driver from Scala

Overall, if you wanted to, you could naturally use the Java driver inside a Scala application, but you probably wouldn’t to. Again, this has very little to do with the quality of the driver itself as much as it has to do with the fact that the API is Java and therefore aimed at a Java consumer audience.

Libraries such as phantom and quill are designed to abstract away many of the internals of the Java driver and make it more appealing to a Scala audience, and phantom goes very very far to mask away all the underlying complexities and provide very advanced support for using Cassandra as a database layer.

Phantom

Phantom is designed to be a one stop destination. It is built exclusively to cater to the app level integration of Scala and Cassandra, meaning it is a lot more than a simple driver.

It is built on top of the Datastax Java driver, and uses all the default connection strategies and so on under the hood, without exposing any of the Java-esque APIs, but at the same time offering a multitude of features that are not available in the Java driver or in any other driver.

Quill

Quill is a compile time macro based DSL that is capable of generating queries directly from a case class. It differs from phantom in several ways, in the sense that Quill aims to build a leaking abstraction style QDSL, meaning a one size fits all driver for a number of different databases.

In the abstract sense it’s the most like for like tool and probably the only one worth comparing to phantom, since the other variant is pure Java and the Spark connector is obviously for, well, Spark. There are other drivers out there, but all discontinued or Thrift based, some of which include Cassie from Twitter, Cascal, Astyanax from Netflix, and so on.

We would be the first to credit the engineering virtue behind it, it’s an excellently designed tool and a very very powerful example of just how far meta-programming can take you. Now that being said, there are a great number of items that make Quill less of suitable tool for application layer Cassandra.

The paper that initially inspired Quill is a strong suggestion that the fundamental approach of having complex database level mappings and lightweight entities is wrong, and that the focus should be on the domain entities and on letting them drive the show. In a sense, phantom follows this same principle of augmenting entities, although it is true that unlike Quill there is an extra layer of indirection through the mapping DSL.

It introduces new terminology

One of the great perks of phantom is the fact that you don’t really need to learn a new syntax. If you are familiar with Scala and you are familiar wth SQL and CQl, you will feel right at home from the very first go, which means close to instant productivity.

It is true that phantom introduces other application level abstractions, such as the Database and the modelling DSL, all of which you need to be aware, but there will never be a time where you read a phantom query and you wonder what the final result looks like.

It doesn’t account for the CQL schema

Probably one of the most powerful features of phantom is the ability to be schema aware at compile time and the fact that you never have to deal with manual CQL or to manually initialise schemas and various other bits. Gone are the days where you are copy pasting CQL from one place to another in your build scripts or loading up schemas from *.cql files in cqlsh.

Phantom has a powerful mechanism to perform what we call schema auto-generation, which means with a single method call it is capable of automatically initialising all the tables in your database on the fly against any keyspace of your choosing. It can also account for more advanced indexing scenarios, user defined types and more, all on the fly.

Let’s have a look at a basic example, for the basic Recipe case class, in this instance indexed by url.


import com.outworkers.phantom.dsl._

case class Recipe(
  url: String,
  description: Option[String],
  ingredients: List[String],
  servings: Option[Int],
  lastCheckedAt: DateTime,
  props: Map[String, String],
  side_id: UUID
)

abstract class Recipes extends Table[Recipes, Recipe] {

  object url extends StringColumn with PartitionKey

  object description extends OptionalStringColumn

  object ingredients extends ListColumn

  object servings extends OptionalIntColumn

  object lastcheckedat extends DateTimeColumn

  object props extends MapColumn[String, String]

  object side_id extends UUIDColumn
}

It’s marginally more boilerplate than something like Quill, however, this simple DSL can help us to great things:

The following query is invalid, because we have not defined any index for the side_id column. This is currently not possible in either of quill or the Java driver, because they do not operate in a schema safe way.

database.recipes.select.where(_.uid eqs someid)

Quill, based on our current understanding, will however happily compile and generate the query, it has no way to know what you wanted to do with the side_id column.

It doesn’t account for protocol version/Cassandra version dependent behaviour

Numerous features or bugs are version dependent in Cassandra and the only way to provide the user with a consistent experience across the numerous features is to be version aware, so in certain contexts the final query is only generated when the protocol version of a give cluster is known.

Naturally this “knowing” can only happen at runtime, and it deals with very simple things such as set the consistency level of query. For older protocol versions, the CONSISTENCY LEVEL part of a query was part of the USING clause, however in more recent versions of the protocol the consistency level has to be specified per session. This is an optimisation to allow the nodes to perform the necessary coordination to achieve the desired CONSISTENCY LEVEL without having to first parse the query.

So CQL went from:

UPDATE keyspace.table WHERE ID = 'some_id' SET a = 'b' USING CONSISTENCY QUOROM;

// to

session.use(ConsistencyLevel.QUORUM)
UPDATE keyspace.table WHERE ID = 'some_id' SET a = 'b';

And any client library will need to transparently handle the change in CQL protocol details. Hoewever, this would be impossible without knowing the version in advance, which means a cluster query which implies runtime.

Extensibility

Quill is likely easier to extend than Phantom, as infix notation and arbitrary string generation is easier to do than it is to extend more complex tightly coupled EDSL structures. But in an ideal world, you wouldn’t be trying to extend the native driver at all, you would instead be welcomed by a wide range of supported features.

In this category, both tools are imperfect and incomplete, and phantom has its own shortcomings. However, the Quill comparison simply states: “You could extend Phantom by extending the DSL to add new features, although it might not be a straightforward process.”, which is a bit inaccurate.

Being a very new player in the game, Quill is a nice toy when it comes to Cassandra feature support and you will often find yourself needing to add features. Phantom has its gaps without a doubt, but it’s a far far more mature alternative, and the amount of times when extension is required are significantly rarer.

A few things to remember:

Dependencies

One of the common pains in modern development is of course the number of dependencies that are brought in. The Quill authors make the somewhat misleading argument that phantom introduces more dependencies and that each third party dependency will bring in more and more modules.

Documentation and commercial support

Both tools can do a lot better in this category, but phantom is probably doing a little better in that department, since we have a plethora of tests, blog posts, and resources, on how to do things in phantom. This is not yet necessarily true of Quill, and we know very well just how challenging the ramp up process to stability can be.

In terms of commercial support, phantom wins. We don’t mean to start a debase on the virtues of open source, and we are aware most of the development community strongly favours OSS licenses and the word “commercial” is unpleasant. However, we are constrained by the economic reality of having to pay the people competent enough to write this software for the benefit of us all and make sure they get enough spare time to focus on these things, which is a lot less fun.

Add in a never ending stream of support messages, emails, chats, feature requests, and bug reports, and you would soon learn the true nature and responsibility of keeping a project like this alive. We know we’re not really competing with Twitter on amount of OSS released, but on an impact/staff member ratio we would happily compete.

Phantom-pro co-exists alongside the default OSS version to offer you more advanced support and a more interesting feature set helping you develop and integrate Cassandra even faster. Spark support with an advanced compile time mapper and more are made possible in phantom-pro, as well as automated table migrations, DSE Graph support, and some other really cool toys such as auto-tables, which will be in some respect similar to Quill as the mapping DSL will not be necessary anymore, but at the same time retain the powerful embedded query EDSL.

Conclusion

Let’s sum up the points that we tried to make here in two key paragraphs.