VENDORiQ: Data Replication Goes Serverless with Google Datastream
26 May 2021: Google has introduced Datasteam, which the vendor defines as a “change data capture and replication service”. In short, the service allows changes in one data source to be replicated to other data sources in near real time. The service currently connects with Oracle and MySQL databases and a slew of Google Cloud services, including BigQuery, Cloud SQL, Cloud Storage, Spanner, and so forth.
Uses for such a service include: updating a data lake or similar repository with data being added to a production database, keeping disparate databases of different types in sync, consolidating global organisation information back to a central repository.
Datastream is based on Cloud functions - or serverless - architecture. This is significant, as it allows for scale-independent integration.
Why it’s Important
Ingesting data scale into Cloud-based data lakes is a challenge and can be costly. Even simple ingestion where data requires little in the way of transformation can be costly when run through a full ETL service. By leveraging serverless functions, Datastream has the potential to significantly lower the cost and improve performance of bringing large volumes of rapidly changing data into a data lake (or an SQL database which is being used as a pseudo data lake).
Using serverless to improve the performance and economics of large scale data ingestion is not a new approach. IBRS interviewed the architecture of a major global streaming service in 2017 regarding how they moved from an integration platform to leveraging AWS Kinesis data pipelines and hand-coded serverless functions, and to achieve more or less the same thing that Google Datastream is providing.
As organisations migrate to Cloud analytics, the ability to rapidly replicate large data sets will grow. Serverless architecture will emerge as an important pattern.
- Analytics architecture leads
- Integration teams
- Enterprise architecture teams
Become familiar with the potential to use serverless / cloud function as a ‘glue’ within your organisation’s Cloud architecture.
Look for opportunities to leverage serverless when designing your organisations next analytics platform.
Related IBRS Advisory