For Data Engineers

The Power of SET (No…Not Set, the Egyptian God of Chaos…just SET)

There are many ways to pack data into FeatureBase, but there are some neat tricks that unlock new ways to process data. To start, very early design choices that our engineering team made allow us to tackle data modeling from a new perspective. One of our largest customers needed to store billions of relationships for a data platform without suffering the traditional downsides of maintaining this dense dataset. Enter the Set Field. 

Normally, it’s challenging to keep a living record for a single entity that gets constant updates. In solutions that are row or column-oriented, there’s an inherent tradeoff in efficiency with either performance of ingest or run-time performance of queries. Even in other structures, we see this type of tradeoff, particularly on upserts. For example, ElasticSearch uses a document style of storing data. In this case, once it’s indexed, it’s orchestrated well for querying, but updating millions or billions of these records requires appending many documents which can bog down data freshness. 

FeatureBase has a field type known as a SET field. Unique to FeatureBase, this field type is excellent for representing data where multiple traits or parameters are logically independent. Let’s say I need to store a string on a record with a unique ID. Going back to our exercise we have the following data to add to one unique record:

Instead of requiring multiple rows of a column in a traditional columnar database, our SET fields are able to pack all of the possible values into a single row. In other words, I have a UUID that is the record key, and create one field called Segments to house all the various string traits displayed in Fig 1.

Figure 1. In FeatureBase a bitmap is created for each value and a bit set for membership with that record

In FeatureBase, I can now query:

Select UUID from {Table} Where Segment = 'Equestrian' AND Segment = '$25,001-$65,000';

Which will produce all records that have that value, in this case UUID = aec5f7f7-fca3-47b8-8034-e71bef68580f would be returned. In effect, you can have what can be thought of as an array of values and it goes faster rather than slower. As a bonus, let’s say a new record comes in that needs to add data to this record:

Upon ingestion, the key will automatically be matched to the record and inserted (or upserted) to the existing record without needing to specify. So the new table would like Figure 2:

Figure 2. Added new trait to existing record 

Since the operation can add this value by simply creating a bitmap and flipping a single bit, it can happen very quickly (data is super fresh) and without heavy overhead (needing extra resources to keep it fast). 

This is a quick overview of how SET fields work in FeatureBase (and the powerful things they allow you to do). Get a better understanding of how SET fields work by following along with this example using FeatureBase and Anvil to create a real-time dashboard.