Why it Matters
These announcements are further evidence of Google’s plans for an integrated and open approach to data management, data science, and data governance. The emphasis on Apache Iceberg as a core component for BigLake indicates a commitment to open table formats, which reduces vendor lock-in and provides data portability across different platforms and engines. This aligns with a broader industry trend towards hybrid architectures that combine aspects of data lakes and data warehouses.
Unifying engines on an open data foundation simplifies data pipelines and reduces duplication, enabling real-time data access for all workloads. This simplifies data science workflows, and AI-powered metadata management improves data discovery and governance. Central policies enforced via BigLake provide more robust data governance.
These announcements also come as Microsoft Purview is gaining traction, pulling organisations into the Azure data ecosystem. IBRS sees advantages in both the Microsoft ‘better together’ unified vendor approach and Google’s more ‘open source’ approach. While the two approaches are not mutually exclusive, organisations should consider their preferred option to reduce technical complexity and, significantly, the breadth of skills required.
Dataplex Universal Catalogue
The integration of AI-powered capabilities within Dataplex Universal Catalogue for metadata management represents a notable development. By aiding in data discovery, curation, and semantic search, the complexities of managing diverse data assets at scale are mitigated. This is an area where AI will make a significant impact on data teams. The ability to enforce central policies across various data and AI sources via BigLake also contributes to a more consistent governance framework and highlights the need for these policies to be developed and agreed by the business.
The developer tooling improvements, such as AI-native notebooks and code generation, are designed to enhance productivity for data professionals, potentially lowering the barrier to entry for complex data tasks and accelerating development cycles.
Who’s Impacted
- Chief Data Officers (CDOs): Keep abreast of how AI can supplement skills in data science and data governance, and begin exploring how specific data roles may change as a result. Ensure that open platforms are considered when planning enterprise data platforms, and weigh the pros and cons, including the availability of skills, prior investments in existing platforms, and potential migration costs. Consider the total investments and returns over ten years, rather than a single enterprise contract life (typically 3 years), to address the transition costs and skills development associated with moving to new architectures.
- Data Scientists, Architects/Engineers: Consider how existing data lakehouse architectures, data pipelines and related data management tools can be optimised with, or enhanced by, emerging AI services.
- Compliance and Legal Teams: Explore how new and rapidly evolving data governance techniques, increasingly powered by AI services, could be leveraged for enhanced governance capabilities and implementation of existing policies. All major vendors are implementing such capabilities, so regardless of the platform your organisation has, it is essential to stay up-to-date with the evolving capabilities.
Next Steps
- Evaluate the implications of adopting open table formats like Apache Iceberg on existing data infrastructure and future data strategy.
- If not already in place, assess the potential for unifying operational and analytical workloads on a single data foundation to reduce data redundancy and improve real-time insights.
- Investigate your data solution’s roadmap for enhanced metadata management and AI-powered governance features to determine their effectiveness in improving data discoverability and compliance.
- Consider the impact of new developer tooling on data team productivity and skill requirements. What new skills will be required to maximise the benefits of the new tools while maintaining quality? How will such new skills impact hiring considerations?
- Review existing data governance policies and update as required before any proof of concept (PoC).