VENDORiQ: Google’s AI-Enhanced Open Data Governance Shows the Future of Data Management

Google's AI-enhanced Dataplex and BigLake updates, leveraging Apache Iceberg, champion open, integrated data management and governance, contrasting with Microsoft's unified approach.

The Latest

Google Cloud has announced updates to its lakehouse data platform. IBRS sees Google’s enhancements in open data governance as particularly notable. The Dataplex Universal Catalogue extends AI-powered metadata discovery and organisation across various data sources, including Biglake Iceberg, BigQuery, Spanner, and Vertex AI models, with AI-driven curation and semantic search capabilities. Developer tooling is also being enhanced with AI-native notebooks and code generation features.

Other developments include the BigLake tables for Apache Iceberg and a preview of BigLake metastore with a new REST Catalogue API, leveraging Google Cloud Storage for managing Iceberg data. The company is also enabling unified operational and analytical engines, allowing interoperability on Iceberg data foundations using BigQuery for analytics and AlloyDB for PostgreSQL.

Why it Matters

These announcements are further evidence of Google’s plans for an integrated and open approach to data management, data science, and data governance. The emphasis on Apache Iceberg as a core component for BigLake indicates a commitment to open table formats, which reduces vendor lock-in and provides data portability across different platforms and engines. This aligns with a broader industry trend towards hybrid architectures that combine aspects of data lakes and data warehouses. 

Unifying engines on an open data foundation simplifies data pipelines and reduces duplication, enabling real-time data access for all workloads. This simplifies data science workflows, and AI-powered metadata management improves data discovery and governance. Central policies enforced via BigLake provide more robust data governance.

These announcements also come as Microsoft Purview is gaining traction, pulling organisations into the Azure data ecosystem. IBRS sees advantages in both the Microsoft ‘better together’ unified vendor approach and Google’s more ‘open source’ approach. While the two approaches are not mutually exclusive, organisations should consider their preferred option to reduce technical complexity and, significantly, the breadth of skills required.

Dataplex Universal Catalogue 

The integration of AI-powered capabilities within Dataplex Universal Catalogue for metadata management represents a notable development. By aiding in data discovery, curation, and semantic search, the complexities of managing diverse data assets at scale are mitigated. This is an area where AI will make a significant impact on data teams. The ability to enforce central policies across various data and AI sources via BigLake also contributes to a more consistent governance framework and highlights the need for these policies to be developed and agreed by the business.

The developer tooling improvements, such as AI-native notebooks and code generation, are designed to enhance productivity for data professionals, potentially lowering the barrier to entry for complex data tasks and accelerating development cycles.

Who’s Impacted

  • Chief Data Officers (CDOs): Keep abreast of how AI can supplement skills in data science and data governance, and begin exploring how specific data roles may change as a result. Ensure that open platforms are considered when planning enterprise data platforms, and weigh the pros and cons, including the availability of skills, prior investments in existing platforms, and potential migration costs. Consider the total investments and returns over ten years, rather than a single enterprise contract life (typically 3 years), to address the transition costs and skills development associated with moving to new architectures.
  • Data Scientists, Architects/Engineers: Consider how existing data lakehouse architectures, data pipelines and related data management tools can be optimised with, or enhanced by, emerging AI services.
  • Compliance and Legal Teams: Explore how new and rapidly evolving data governance techniques, increasingly powered by AI services, could be leveraged for enhanced governance capabilities and implementation of existing policies. All major vendors are implementing such capabilities, so regardless of the platform your organisation has, it is essential to stay up-to-date with the evolving capabilities.

Next Steps

  • Evaluate the implications of adopting open table formats like Apache Iceberg on existing data infrastructure and future data strategy.
  • If not already in place, assess the potential for unifying operational and analytical workloads on a single data foundation to reduce data redundancy and improve real-time insights.
  • Investigate your data solution’s roadmap for enhanced metadata management and AI-powered governance features to determine their effectiveness in improving data discoverability and compliance.
  • Consider the impact of new developer tooling on data team productivity and skill requirements.  What new skills will be required to maximise the benefits of the new tools while maintaining quality? How will such new skills impact hiring considerations?
  • Review existing data governance policies and update as required before any proof of concept (PoC).

Trouble viewing this article?

Search

Register for complimentary membership where you will receive:
  • Complimentary research
  • Free vendor analysis
  • Invitations to events and webinars
Delivered to your inbox each week