ClickHouse is an open-source, column-oriented database for online analytical processing (OLAP). The technology was originally developed about 10 years ago at Yandex, Russia’s largest technology company, and is heavily built on Apache Arrow.
ClickHouse was designed with a single objective: to filter and aggregate as much data as possible as quickly as possible. It is often used as the database for use cases requiring large-scale aggregations in real-time (for example, server log analysis to determine error rates, response times, etc.). It does not support transactional workloads and, in fact, updates data asynchronously, which can create inconsistencies in the data, leading users to look for a ClickHouse alternative or competitor. Because it is open-source, several SaaS offerings are built on ClickHouse, including Firebolt, Tinybird, and Altinity.
While ClickHouse is DevOps friendly and is almost unbeatable for complex grouping aggregation queries, it does have some weaknesses – such as updates and deletes – that can become detrimental if your workloads have certain dependencies which should be considered when determining whether to switch to a ClickHouse alternative.
FeatureBase is a real-time database built on bitmaps. It is designed primarily for speed and horizontal scalability, and is particularly well-suited for workloads that require many real-time updates, inserts, and deletes on massive datasets.
FeatureBase ingests data continuously to execute on analytical workloads in real-time for the front lines of your business. Ingest millions of events per second with ACID compliance while simultaneously analyzing, transforming, and aggregating billions of rows of data at greater than 100x the price-performance of traditional columnar databases.
As a ClickHouse alternative, FeatureBase has key technical differences you should consider, including data ingestion, query capabilities, data modeling, and the data format. Let’s look at each.
ClickHouse does not support real-time, record-by-record data ingestion, but instead recommends performing inserts in batches of at least 1000 records, or no more than one insertion per second. Because of this, it’s essential to configure ClickHouse to maximize the number of records per insertion. Depending on your use case and the throughput rate of your input data, even these configurations may not be sufficient to optimize writes into ClickHouse.
FeatureBase seamlessly handles the ingestion of massive-scale streaming data while simultaneously allowing for real-time inserts and updates to existing data schemas. While FeatureBase is not explicitly optimized for writes and also ingests data in microbatches, it is able to scale out horizontally and also employs several optimizations (like write-ahead logs) to support required throughputs. Additionally, FeatureBase is able to do a lot of preprocessing on the client side so that users not only have the option to scale out the actual database servers, but can actually offload much of the computation to ingest servers. These ingest servers can be ephemeral and exist only while there’s load. Essentially, they are elastic and further improve ingest efficiencies.
ClickHouse is a column-oriented OLAP database, which means that it excels at analytical workloads, not transactional workloads. It is designed to analyze immutable data (e.g., logs, events, and metrics). ClickHouse is particularly good at aggregations and filters, but because it was intended for immutable data, it does not do well with update or delete queries.
FeatureBase also excels at analytical workloads, but it is built on bitmaps that allows for a few step function improvements over column-oriented databases (we’ll dive into this in more detail later). As a result, it is extremely good at supporting live updates while maintaining low-latency queries. FeatureBase’s novel approach to data minimizes I/O on queries by allowing the database engine to read and write exactly the data it needs and intelligently compress that data in memory.
ClickHouse is a column-oriented database that uses indexing structures and materialized views to speed up query execution when a filter is provided. In ClickHouse, batch deletes and updates occur asynchronously, which might seem trivial, but can actually cause major impacts to materialized views because the server cannot automatically update multiple tables at once. These impacts mean that data can be inconsistent and unreliable.
FeatureBase models data in a novel way. Tables are typically modeled around entities (customers, patients, unique IDs, etc.) or events (transactions, etc.). In addition, tables can have multiple sources (batch and streaming) and update records or add new fields in real time. Mapping relational tables to FeatureBase can be as simple as a one-to-one mapping, but significant performance improvements can be made through the mapping and feature table structure depending on:
ClickHouse, as mentioned above, is built heavily on Apache Arrow and, therefore, utilizes a column-oriented data format that has been tuned for particular performance optimizations. In its technical documentation, ClickHouse states that it is extremely fast because of a few key benefits that result from its optimized column-oriented format including data compression, vectorized query execution, and scalability.
FeatureBase is built entirely on bitmaps. The beauty of the format is that it takes all of the benefits that column-oriented formats provide over row-oriented formats (as listed above in ClickHouse’s technical documentation) and actually optimizes them even further, resulting in a 10-100X cost/performance reduction.
For example, while column-oriented databases allow only relevant columns to be scanned to answer queries (instead of scanning every single row), FeatureBase takes that even further, requiring only the specific value to be scanned.
As another example, ClickHouse optimizes compression by storing different values of the same column together. FeatureBase stores data as compressed bits set within a bitmap. It can optimize that compression and storage even further by utilizing three different types of encodings that it intelligently adjusts based on the individual dataset.
If this is a bit confusing (it’s a new concept, after all!), it can be easiest to understand with an example. Let’s say we’re trying to count the number of people wearing green shirts.
In FeatureBase, the database can go directly to the value “Green Shirt Color” to count the bits set in a compressed bitmap for that value. This means the database only has to deal with the data that tells it whether a person is wearing a green shirt or not. It can ignore everything else (including other shirt color options!).
FeatureBase is a real-time database built on bitmaps. FeatureBase continuously extracts and updates features from streaming technologies like Kafka and other data sources without the need for staging or preaggregation. This is a crucial differentiator when deciding whether your organization would be better suited to use a ClickHouse alternative such as FeatureBase.
If your organization is looking for flexibility to ad-hoc query the freshest data as soon as it hits your database, or if you’re highly dependent on updates and deletes, you will struggle with ClickHouse. Lastly, if you’re looking for the ability to filter based on time ranges, FeatureBase excels, while ClickHouse falls down in all but a few use cases.
docker run -p 10101:10101 featurebasedb/featurebase
git clone https://github.com/FeatureBaseDB/featurebase-examples.git
docker network create fbnet
docker compose up