The Big Data Landscape per Clustrix

By GilPress
May 7th, 2013
- Big Data Landscape

Database Landscape Categories

Many of the above databases are complementary and which one you use depends on the particular workload you’re running. Let’s look at each category in detail.

In-memory Row Stores	These databases are entirely held in-memory (RAM) to support fast transactions. The data may also be distributed and replicated across multiple nodes. This type of database is designed for smaller data sets (<1 TB). These databases do very fast point reads and writes, especially when low latency is required (as an example, < 10ms). However, they have very limited support for analytic queries.
In-memory Column Stores	These databases are entirely held in-memory (RAM) and the data is held as columns for fast analytics. So far, the only available product in this category is SAP Hana.
Single Node Row Stores	This group contains all the primary databases that are good at both transactions and analytics, but do not scale beyond a single server. Some developers use sharding to scale them beyond a single node, but at the considerable cost of additional development and administration overhead. These databases were architected many years ago to work on a single node.
NoSQL Write Databases	These databases are designed for unstructured data, non-relational workloads. However, they are not designed for concurrency or complex write loads. They also do not support complex analytic queries. Most do not even support basic constructs such as joins.
Shared Data Row Stores	These have multiple query processing nodes but a single data nodes. The query processing nodes pull data to process queries and push back updated data. These solutions deliver high availability but do not work well with high concurrency since multiple nodes can write the same data table at the same time. Also, a single node is used to process a query, which hurts the performance of analytic queries.
Shared Nothing Row Stores	Clustrix is the only database in this category. Databases in this category are able to scale transactions and support real-time analytics. Row orientation allows for scalable transaction performance. Massively Parallel Processing (MPP) allows them to run fast real-time analytics. Data sets can range up to 100 terabytes. Beyond that, for an analytics workload, columnar storage and compression become critical.
Shared Nothing Column Stores	These databases are designed for offline analytics. With columnar compression and by reading only the columns the queries requires, they are able to scale from 100s of terabytes to petabytes of data. However, due to columnar orientation, these databases are not able to support transactions or fast writes (some of them allow fast appends, but not fast updates). They usually rely on ETL from primary transactional databases.

Source: Clustrix

For other Big Data Landscapes, see here and here

Last updated on May 7th, 2013.

More from Big Data Landscape

Big Data Landscape 2017: Big Data + AI = New IT Stack

Big Data Landscape

Big Data Landscape 2017: Big Data + AI = New IT Stack

April 11th, 2017

Big Data Ecosystem and Benefits

Big Data Analytics

Big Data Landscape

Big Data Ecosystem and Benefits

March 28th, 2017

Top 10 Most-Funded Big Data Startups April 2015

Big Data Landscape

Top 10 Most-Funded Big Data Startups April 2015

May 15th, 2015