Case Studies

Real-Time Customer Segmentation: 350 Million Customer Records in Milliseconds

Customer data footprint reduced from 700GB to <70GB.
350 million records queried in 9 milliseconds.

How to Achieve Real-Time Customer Segmentation at ANY Scale:

A large ad-tech company that provides a one-stop customer data platform for internal stakeholders struggled to gain valuable insights about its customers. A primary cause of this challenge was that data was split among several different time-series and reference datasets. Despite having a treasure trove of actionable data (including multiple data warehouses and pipelining tools), they could not match profiles across datasets. The inability to do so caused the company to miss out on opportunities to better segment and analyze their data. Even when they could match profiles and segment customers, their queries took so long to run that they were irrelevant when they returned (and cost a fortune!).

Implementing FeatureBase within their infrastructure resulted in millisecond complex queries of more than 350 million customer records. Let’s walk through how:

The Existing Schema and Opportunity

With a strong emphasis on regulatory compliance and privacy, the company shared data with obfuscated emails via SHA-256 bit hashing. With the pre-existing data schema, ingestion could be completed via batch or streaming through a common carrier such as Apache Kafka. Sample dataset fields included:

  • Geographic ZIP information
  • Advertisement IDs
  • Device type
  • Browser type
  • URL visits
  • Demographic data consisting of 60+ fields

The company believed stitching this data would inform platform users of advertisements viewed by target segments. They also wanted to refine targeting by geographic region, which has traditionally proven challenging.

Updated Data Schema:

Using FeatureBase to Localize and Segment JOINs Based on Time

The ad-tech company was impressed that FeatureBase could not only link unique IDs or factual information about individuals and companies across datasets but also validate by associating ID with a specific timestamp. With FeatureBase’s multi-layered, crosslinking functionality, the customer localizes and segments specific JOINs based on time. For example, a record can be associated with a specific timestamp, providing flexibility to search through billions of records at granular windows of time. Because there is a live JOIN when executing this query, the process traditionally requires preaggregating the time window portion of the data; with FeatureBase’s highly performant, low-latency format, the JOIN is performed directly within the query.

For a more detailed look, here are a couple sample queries the customer found valuable:

[Audience Facts]Groupby(Company), filter=Distinct(Row(Device Platform=’iOS Mobile’,

             from=’2021-03-02T03:00′, to =’2021-03-23T03:00′ ),

field = ip address, table=Ad Serving)

The above query performs a Groupby of companies from the “Audience Facts” table and filters from the separate table “Ad Serving”, searching for a specific platform within a specified time range. To be more granular: in the query above, the platform was “iOS devices,” and the goal was to associate IP ranges of those devices to particular companies of interest in real time.

[Ad Serving] TopK( Row(Ad Place= Ad URL)

The above query is a TopK, an ordering operation that returns the top ad URLs across all records. Note: with FeatureBase, this query is scalable and performant even in the billions of records! These queries are excellent examples of how the customer reduced preaggregation and used FeatureBase to discover and validate customer segments in milliseconds.

Watch our on-demand real-time customer segmentation example demo to learn more.

How to Reduce Data Footprint and Compute

With over 700GB of compressed data, the customer initially struggled to extract value. Data cardinality and scale caused the original cloud data warehouse and pipelining tool to be slow and expensive per query. FeatureBase enabled them to search across datasets, count intersections, and reduce preaggregation— all at ultra-low-latency!

With FeatureBase, the customer reduced their storage requirements by 10X and minimized compute consumption, all while accomplishing their primary objective: use the data to solve a real-time customer segmentation challenge.

Experience FeatureBase for yourself by compiling or downloading our open source offering or starting a free Cloud trial now.

Get Started for Free

Open Source install commands are included below.


git clone
cd featurebase-examples/docker-example

docker-compose -f docker-compose.yml up -d

# TIP: Disable Docker compose v2 if needed by going to settings..general in Docker Desktop.

git clone
cd featurebase-examples/docker-example

docker-compose -f docker-compose.yml up -d

# TIP: Disable Docker compose v2 if needed by going to settings..general in Docker Desktop.