The Evolution of Customer Data Platforms: Data Pipelines

Customer Data Platforms: The Evolution

I recently wrote an article on Customer Data Platforms (CDPs) which summarized the capabilities, solutions and the realities of building or implementing a CDP.  A CDP is a platform which consolidates all customer and touchpoint data, creates profiles, models and insights, and then uses those insights in targeting and optimization efforts on all available channels.. CDPs, and the underlying technologies, have evolved significantly since their inception in the mid-2010’s.  One of the most impactful evolutions is the emergence of data pipeline solutions. 


Customer Data Platforms: Data Pipelines

Data Pipeline solutions are relatively newer and a data pipeline is a term used to describe the process of pushing data from one system to another.  Data pipeline solutions streamline the setup and automate, and self-monitor, the data flow.   They enable a more seamless data stream – from source to CDP  then back to the activation channels.  

Sourcing can include campaign and promotional data from the marketing channels such as Facebook & Google Ads, sales solutions including Salesforce and finance systems such as Peoplesoft.  Data pipeline solutions streamline the connection to those systems and transfer the data at the timelines required (real-time, hourly, daily,..) based on business need.  Data pipeline solutions also make it more efficient to load and transform the data into a database.  In the past the process required API programming then manually testing and loading the data required.  These solutions handle the heaving lifting which minimizes the coding.   Data can also be sourced from internal solutions such as operational databases and marketing automation systems. 

Activation is the process where data is pushed back to the channels, both internal and external, in order to automate the enhanced targeting and tracking that a CDP provides. Once the analyses and models are complete, and it is known which customer should be targeted with which offer or product, the data pipeline solutions automate the process of pushing all relevant data back to the systems (Linkedin, Twitter, Facebook, Salesforce, Mailchimp,…).    

Data Investigation & Ad-Hoc Analyses: Another advantage of using a data pipeline solution is that they may be used by analytic teams & business analysts with little to no assistance from the technical teams.  This allows the business analysts to source data from various systems themselves, scan the available data, and use the data in ad-hoc analyses.  This saves significant time compared to the past where the technical team needed to create a manual data flow process before the business teams could evaluate the potential value of the data. 


Data Pipelines: Considerations

What should be considered when selecting a data pipeline solution?  This will vary based on your business and technical needs but the primary considerations include:  

Data Privacy: Many of the solutions are hosted and some are installable in your on-premise environment.  Your data privacy needs, and the types of data that will be accessed by the data pipeline solution will be a consideration as if data privacy is not a significant factor then both options may be viable (hosted or on-premise) otherwise if the need to keep data within your environment is a requirement then an on-premise solution may be your best option.  Data security certifications such as GDPR and HIPAA should be considered as well  

Integrations: The solution should integrate with all the primary systems you plan to source data and should be able to load data into your CDP’s underlying database.  Consider the solution integration roadmaps as well as your data sources and destinations will evolve over time.   Sources and destinations can include: 

  • External / Cloud Solutions: Facebook Ads, Twitter Ads, Google Ads, Bing Ads, Google Analytics, Marketo, Mailchimp, Hubspot, Salesforce, Linkedin Ads, Adobe Analytics, Peoplesoft, Paypal
  • Databases / Warehouses:  MySQL, Teradata, Snowflake, Google Bigquery, AWS Redshift & S3, Azure Table Storage, Oracle, SQL Server, MariaDB, MongoDB
  • Internal Solutions:  Internal operational databases and installed custom solutions like a marketing automation product. 

Technical: Your technical needs, the technical bandwidth available, and your appetite for open source will be a consideration.  Implementing an open source solution will take time and require ongoing maintenance.   

Cost:   Cost does vary and some of the hosted solutions’ pricing are driven by data volumes.  Obviously open source is free but the time required to implement the solution and ongoing support should be a consideration.  

Data Transformations:  Data pipeline solutions typically have built-in data transformation capabilities.  These may be useful in situations where real-time data is needed and basic transformations are required but in general databases tend to be better data transformation and processing environments.  


Data Pipelines: Solutions Available

The purpose here is not to provide an exhaustive list of available data pipeline solutions, along with their capabilities but I will touch on a few below as examples.  There are several good resources available to help you determine the top solutions that best fit your needs.  

  • Segment:  The big player in this space.  A well evolved solution set used by many firms.   
  • Airbyte: A newer data pipeline solution that is growing quickly with plenty of marketing integrations.  Available as a cloud-based hosted solution and an on premise open source version.  
  • Five Tran: A hosted larger player on the market. The solution contains a large list of integrations, plenty of data transformation options, and well evolved security features such as VPN and SSH.  HIPAA certified. 
  • Rudderstack:  Rudderstack enterprise cloud (hosted) has a solid list of integrations and is known to be well integrated with internal data systems with strong reverse-ETL capabilities. An open source version is available as well.