not only SQL

In the past it was relatively easy to select a database system. If you wanted an Open Source solution you selected MySQL. If you were a Microsoft shop, you selected their SQL Server and if you were a UNIX shop you selected Oracle. Now, a large number of NoSQL database systems have emerged.

This has mainly happened as result of the difficulty scaling traditional SQL database products for Cloud. There is still huge debate around this topic, particularly from the SQL traditionalists but one cannot ignore that most recent Cloud design wins appear to have gone for noSQL. New technologies often go in waves with high initial expectations so we are more cautious about being swept up by this enthusiasm.

We prefer to define noSQL as not only SQLsince we believe there is still a strong case of using SQL for the right applications, just that now it should be a consious decision rather than an automatic default.

To achieve their scalability, NoSQL implementations make sacrifices. According to Eric Brewer's CAP theory, you can pick any two of Consistency, Availability or Partitionability. SQL databases are typically optimized for Consistency and Availability. Partitionability is critical to achieving the horizontal scalability needed in the cloud. Most noSQL implementations sacrifice consistency in order to be partitionable. These adopt the BASE approach which forfeits the ACID properties of consistency and isolation in favour of "availability, graceful degradation, and performance" The acronym BASE is composed of the following characteristics:

  • "Basically available
  • "Soft-state
  • "Eventual consistency

However, this seems to ignore SQL itself. The power of SQL is that in a fully normalized form, any data item only needs to be stored once. Views are then used to combine these into relevant forms. This requires complex queries. The advantage of this approach is flexibility. The data model can easily change during the lifetime of the system. Indexes can be added or removed anywhere to improve performance. The system can guarantee referential and transactional integrity. The downside is the compute cost and bootleneck of the Query processing to support these SQL queries, the limited scalability, the slow performance (dictated by disk storage speeds) and the lack of online maintenance.

Most noSQL implementations are based on a Key-Value or Documents. Some have added limited indexes and allow the Value to be a tuple, but only support simple data models and fix that model for the lifetime of the system. As a consequence only simple and static data models are appropriate. As indexes are limited, many queries result in full table scans. Views in particular are very onerous and are best avoided completely. Many take advantage of the mapreduce design pattern to parallelize queries. Maintaining referential integrity usually becomes a responsibility of the application code.

An eventually consistent approach adds the complication of conflict resolution. This is usually achieved with timestamping but it's not infallible and generally some level of conflict reolution will get exposed to the application which must then deal with it.

Many Cloud apps have a simple, predictable data model which should remain static. Many of the noSQL solutions have been designed to support Cloud deployments, whereas the SQL Databases were designed for Enterprise usage. However, the noSQL databases are new and subject to frequent code changes which impacts stability and creates risk. The SQL databases are proven and more stable but may not be sufficiently performant or scalable to meet our requirements.

Database choices

There are many noSQL database systems available, each making different CAP and performance tradeoffs, consequently there is no one right system for all applications. We describe this new world as fat applications, thin databases, since a lot of the responsibility around referential integrity and data organization has been pushed up to the application, in exchange it gets a faster, leaner database engine. This is not a bad thing, since the application is often the best place to make these choices. Too often with traditional SQL engines we've ended up fighting them to get them do what we want.

We've been designing Cloud apps for many years. We are finding that given the choices now available, noSQL is usually the best choice. However, we find there is no one best noSQL solution. We've found it usually takes more than one noSQL solution. This is because of the closer relationship between the application code and the database. We've been using Riak, Couchbase and Erlang's own Mnesia system.

We are also able to help existing systems to achieve their performance objectives by refactoring critical parts of the design to noSQL and to help people chose the right noSQL for their requirements through a combination of requirments analysis and POC testing.

See our Big Data page where we talk more about optimizing the storage infrastructure to support noSQL.

Technology watchlist