Low Cost Customer Data  Platforms

Host your own open-source CDP platform to replace expensive software like Segment or FiveTran.

Customer data platforms like segment.com are a very valuable software tool for online businesses, but the cost be prohibitive for small-to-medium sized businesses. I see many smaller organizations struggle with the issues that a CDP can solve, so I often help them explore and understand that it is in fact possible to set up a powerful, robust CDP that rivals many of the expensive software options available.

Almost all of these clients are extremely happy with the results, and they didn’t realize it was even possible for them to have something like this available to them within their budget. I also see mid-to-large businesses looking to reduce their budget as their use of a platform grows, which can grow costs exponentially.

What is a CDP?

A Customer Data Platform (CDP) is a software tool that helps businesses create a unified view of their customers by bringing together data from various sources. This data can include things like a customer’s purchase history, website interactions, and email engagement. By unifying this data, CDPs help businesses gain a deeper understanding of their customers and personalize their marketing campaigns.

The way I typically describe the concept to my clients is that a CDP is a tool that allows you to easily, reliably and transparently connect and unify your analytics and advertising data at rapid speeds. It removes custom coded solutions, allows for easy management and understand amongst team members, and much more. But it’s usually expensive.

Segment.com is an expensive customer data platform
Segment.com is a leading CDP provider

Self-hosted, Open-source Software

The concept of a CDP has been around long enough that it has reached what I’d call market saturation. As a result, there are now multiple open-source options available that can be self-hosted, which can provide a few significant business advantages for the right organizations.

Lowering costs

at a drastically reduced cost. It requires an initial investment up front, and possibly some maintenance work down the road for upgrades and debugging, but overall it’s a significant savings. This is particularly true for any organizations that process a large volume of data.

How it is cheaper? Open-source software is free and can be self-hosted, very similar to the way WordPress.org works. It is also available for a monthly fee as a cloud hosted tool, which is the equivalent of WordPress.com. When self-hosted, the only cost you’ll have is the AWS or Google Cloud Compute instance you set up to run it.

Airbyte

Airbyte is an open-source data integration platform that can be used to easily connect and integrate data to and from hundreds of data sources and destinations. Take a look at the full list of connections supported and you’ll find that it can handle just as much and in many cases more than premium paid products like Segment.com, Customer.io or Tealium.

Airbyte Self-hosted ETL Platform's Connector Catalog

Rudderstack

If you’re currently using Segment.com and are looking to reduce costs, then Rudderstack is a great open-source, self-hosted alternative worth looking at. The UI is based off of Segment’s, and the core functionality is the same (excluding some of the advanced segmentation/business tools). Depending on how complex your Segment.com installation is you should be able to replicate 100% of its functionality at less than a fraction of the cost.

RudderStack: A Self-hosted Open-source Segment.com Alternative

Hosting Recommendations & Costs

I typically recommend a Google Cloud Compute e2-highcpu-2 for US-based clients, which costs $36.11/month. Once you set up and configure one, there is no limit on the number of records/users you track each month. You’re only limited by the compute resources you need. You may need or want to upgrade to a higher powered CPU if you end up processing a large amount of concurrent data, but if you’re mostly running daily batches you can use this machine type to transfer very large amounts by staggering your scheduled sync times. This will give each transfer the maximum compute and processing powered while it runs, so you’ll still have very performant ETL processes running.

Both Airbyte and RudderStack have straight forward installation processes that are well described in their software documentation:

You will need to work with an experienced team member or consultant with a technical background in basic systems admin and/or server setup, but the added up front costs and time are well worth the investment in almost all cases.

What software can these open-source options replace?

Here’s a list of the common closed software CDPs I see in use. If you have one of these and are experiencing high costs you’d like to reduce, then exploring a self-hosted, open-source alternative like Airbyte or RudderStack may be a great option.

  • Segment.com
  • FiveTran
  • Zapier
  • Customer.io
  • Snowflake
  • Stitch
  • Azure Data Factory
  • Databricks
  • Tealium
  • mParticle
  • Treasure Data
  • Adobe Experience Platform
  • BlueConic
  • Zeta Global
  • Salesforce Customer 360
  • Insider
  • ActionIQ

For the right sized business, many of these options can be the right choice, depending on the specific requirements. The choice to self-host on your own is an important business consideration, but in my experience, I have never worked with a client that is displeased with the decision once they make the switch.

Meet the Author

Kevin Leary, Custom WordPress Developer & BigQuery Consultant

I'm a custom WordPress developer and BigQuery consultant based in Boston, MA. I've been an independent freelance contractor for the last 16 years, helping business build, grow and maintain product websites, web applications and analytics systems. See real-world examples of my work, or contact me about your next project if you're in need of a good freelance developer.