The Big Data Landscape per Clustrix





Database Landscape Categories

Many of the above databases are complementary and which one you use depends on the particular workload you’re running. Let’s look at each category in detail.

In-memory Row Stores

These databases are entirely held in-memory (RAM) to support fast transactions. The data may also be distributed and replicated across multiple nodes. This type of database is designed for smaller data sets (<1 TB). These databases do very fast point reads and writes, especially when low latency is required (as an example, < 10ms). However, they have very limited support for analytic queries.

In-memory Column Stores

These databases are entirely held in-memory (RAM) and the data is held as columns for fast analytics. So far, the only available product in this category is SAP Hana.

Single Node Row Stores

This group contains all the primary databases that are good at both transactions and analytics, but do not scale beyond a single server. Some developers use sharding to scale them beyond a single node, but at the considerable cost of additional development and administration overhead. These databases were architected many years ago to work on a single node.

NoSQL Write Databases

These databases are designed for unstructured data, non-relational workloads. However, they are not designed for concurrency or complex write loads. They also do not support complex analytic queries. Most do not even support basic constructs such as joins.

Shared Data Row Stores

These have multiple query processing nodes but a single data nodes. The query processing nodes pull data to process queries and push back updated data. These solutions deliver high availability but do not work well with high concurrency since multiple nodes can write the same data table at the same time. Also, a single node is used to process a query, which hurts the performance of analytic queries.

Shared Nothing Row Stores

Clustrix is the only database in this category. Databases in this category are able to scale transactions and support real-time analytics. Row orientation allows for scalable transaction performance. Massively Parallel Processing (MPP) allows them to run fast real-time analytics. Data sets can range up to 100 terabytes. Beyond that, for an analytics workload, columnar storage and compression become critical.

Shared Nothing Column Stores

These databases are designed for offline analytics. With columnar compression and by reading only the columns the queries requires, they are able to scale from 100s of terabytes to petabytes of data. However, due to columnar orientation, these databases are not able to support transactions or fast writes (some of them allow fast appends, but not fast updates). They usually rely on ETL from primary transactional databases.



Source: Clustrix

For other Big Data Landscapes, see here and here