The modern data stack (MDS) is essential for digital disruptors. Consider Netflix. Netflix has pioneered a new business model around video as a service, but much of their success relies on real-time streaming data.
They are using analytics to send highly relevant recommendations to their viewers. They are monitoring data in real time to maintain constant visibility into network performance. They are synchronizing their movie and show database with Elasticsearch to allow users to find what they are looking for quickly and easily.
This must be in real time and must be 100% accurate. Old school extraction, transformation, loading (ETL) is just too slow. To meet this need, Netflix has created an Edit Data Acquisition (CDC) tool called DBLog
Netflix required high availability and real-time synchronization. They also needed to minimize the impact on operational databases. CDC keys out of database logs, replicating changes to target databases in the order they occur, so you can capture changes as they occur, without locking records or otherwise bogging down the source database.
Data is key to what Netflix does, but they’re not alone in this regard. Companies like Uber, Amazon, Airbnb and Facebook are thriving because they really know how to make data work for their benefit. Data management and data analytics are strategic pillars for these organizations, and CDC technology plays a central role in their ability to perform their core missions.
The same can be said of nearly all companies operating at the top of their game in today’s business environment. If you want your business to operate as an A-player, you need to modernize and master your data. Your competitors are already doing this.
The sub-second integration is the new standard of Airbnb and Uber
In today’s world, a strong customer experience requires real-time data streams. Airbnb recognized the value of CDC technology in creating a great CX for its customers and hosts. They too have built their own CDC platform, which they call Spinal stroke. Dynamic Airbnb pricing, listing availability, and booking status require flawless accuracy and consistency across all systems. When an Airbnb customer books a visit, they expect workflows to be very fast and 100% accurate.
For Uber, immediacy is arguably even more important. Whether a customer is waiting for a ride to the airport or ordering a food delivery, timing matters a lot. Just like Netflix and Airbnb, they’ve developed their own CDC platform to synchronize data between multiple data archives in real time. Once again, a common set of requirements emerged. Uber needed his solution to be extremely fast and fault tolerant, with zero data loss. They also needed a solution that would not degrade the performance of the source databases.
Change the data capture for the rest of us
Once again, CDC fits the bill. In the past, the nightly batch ETL might have been adequate to provide daily executive update or operational reports. Today, real time is increasingly the norm. If information is power, instant access to information is turbo power.
That’s why CDC is fast becoming a core requirement for the modern data stack. It’s okay, though, that big companies like Netflix, Airbnb, and Uber have the resources to create custom CDC platforms, but what about everyone else?
Out-of-the-box CDC solutions are filling this gap, offering the same low-latency, high-quality streaming pipelines without the need to build them from scratch.
Unfortunately, they are not all the same. Most businesses manage a collection of systems that manage ERP, CRM, or specialized operational functions such as procurement or human resources. These run on different database platforms, with inconsistent data models. If a company operates mainframe systems, it is likely to be dealing with arcane data structures that don’t easily fit modern relational data.
This makes heterogeneous integration particularly important. Requires connection to multiple data sources and destinations, including transactional databases such as SAP, Oracle, DB2, and Salesforce. It means providing live streaming data to platforms such as Databricks, Kafka, Snowflake, Amazon DocumentDB, and Azure Synapse.
To drive artificial intelligence (AI) and advanced analytics, companies need to send their data to a common MDS platform. This means capturing information from a variety of sources, transforming it to fit into a unified model for analytics, and delivering it to a modern cloud-based data platform.
Change data capture technology serves as a critical link in the data-driven value chain, first by automating the acquisition of data from source systems, then transforming it on the fly and delivering it to a cloud data platform. Real-time CDC automation ensures the right information arrives at the right place, instantly.
Because they focus only on data that has changed, streaming CDC pipelines offer huge efficiency gains over batch mode operations of the past. The best CDC solutions can deliver over 100 terabytes of data from source to destination in less than 30 minutes, without any data loss.
The transition to cloud computing is well underway. Cloud analytics, in particular, offers clear benefits to companies that truly understand the transformative role of data. Leading companies in every industry are aligning their strategic views on data analytics. They are digitizing their interactions with customers and using algorithms to study data, extract information and take action. Artificial intelligence and machine learning are ingesting huge amounts of information, uncovering correlations and identifying anomalies.
Whether you’re pioneering the digital revolution or just trying to keep up with the group, CDC technology will play a vital role in making the modern data stack a reality and opening the door to digital transformation.
As first published in VentureBeat.
Gary Hagmueller is the CEO of Tree, the world’s only cloud-native and CDC-based data replication platform. Gary is an established leader who has created over $ 7.5 billion in business value through two IPOs and four mergers and acquisitions exits in his more than 20 years in the technology industry. Gary holds an MBA from the Marshall School of Business at the University of Southern California, where he was named a Sheth Fellow, and a BA in Business Administration from Arizona State University. As the father of twins, he has a clear background in project management and negotiation skills. For more information on Arcion, visit www.arcio.io/and follow the company LinkedIn, Youtube And @ArcionLabs.