Data Mesh Governance by Example
Curated examples for Data Mesh guiding values, an operating model, and global policies to support a federated governance group.
We want this to be an open source collection of policy examples, driven by the community. Contribute by submitting a pull request on the GitHub repository.
The data mesh governance group consists of representatives from the domain teams and the data platform team.
They are temporarily supported by a subject-matter experts, to address special issues, e.g. concerning legal, compliance, and security.
Together, they make sure that data products in the mesh are interoperable and can be used securely. For this, they agree on a few architectural decisions and global policies. To make it easy for domain teams to implement the policies, they specify the requirements for the data platform to automate the policies as much as possible.
Guiding Values
Guiding values are the fundamental beliefs we agree on when implementing data mesh governance. They guide us to make the right choices and give justification for our decisions.
- Promote the usage of data products
- Optimize experience for generalist majority
- Standardize for interoperability
- Enforce consistent security
- Design for automation
Operating Model
The operating model defines the structure and processes of the data mesh governance group. After forming the group with its members, in the first meeting the collaboration mode, communication channels and a policy repository needs to be decided on.
Members
Collaboration Mode
- Regular online meetings
- Local Data Groups
- Asynchronous collaboration (no meetings)
Decision Making
- Consent
- Consensus
- Democratic
Communication Channels
- Microsoft Teams Channels
- Slack Channels
- Email Lists
Policy Repository
- Data Mesh Manager
- Confluence
- Git
Policies
Definitions
Interoperability
- Data Product Specification
- Data Contract Specification
- Address scheme
- File Format
- Partitioning Keys
- Timestamp as ISO-8601 Strings
- Money amounts in cents as integers
- Common IDs
- Well-known Fields Names
- Bitemporal Timestamp Fields
- Naming Conventions (environment, database, table, column, file, bucket, ...)
Isolation
- Project structure
- Environments
- Production only
- Multiple Isolated Environments
- Central Governance Account
- Separate Account per Domain Team
- Separate Database per Domain Team
- Separate Schema per Domain Team
Discoverability
- Data Product Inventory
- Confluence Wiki Page
- Data Mesh Manager
- Backstage
- LeanIX
- Custom Web-Application
- Data Catalog
- Data Catalog
- AWS Glue Data Catalog
- GCP Dataplex
- Azure Purview
- Databricks Unity
- Collibra
- Atlan
- Tagging Tables as Data Products
- Mandatory Ownership Information
- Mandatory Tags
Quality
- Retire unused data products after 6 months
- Minimum level quality of a data product
Documentation
- Documentation of data products
- Wiki
- Data Catalog
- Mandatory Fields for Data Products
- Schema Format
Access Control
- Access Request
- Ticket with manual steps
- Decentralized self-service via Pull Requests
- Central self-service app with decentralized handlers
- Access granted through AWS IAM Policies
- ACLs managed by domain teams
- Reassess after x month
Consent Management
- One domain published consents as data product
Privacy & Compliance
- Data Classification
- PII data separation
- PII Anonymization
- Data Stored in Customer's Business Region
- PHI (protected health info)
- Data Retention Periods
- Right to be Forgotten By Tombstone Events
- Politically exposed person (PEP)
- People in witness protection program
Security
- Encryption at Rest
- Encryption at Transit
- VPC
Monitoring
- Observability Metrics
- Cost reporting
Self-service
- Data Product Creation
- Self-service app (Backstage.io)
- Tutorials/guides
Ownership
- Ownership for New Data Products
- Ownership for Legacy Data Products
Architecture Decisions
While it is not the federated governance group's actual job to define the architecture of the data platform, decisions about the platform have consequences for global policies and vice versa, e.g. for policy automation and monitoring. The governance group always has to keep track of those decisions related to the data platform.
Data Platform
- AWS S3 as Storage for Data Products
- AWS Athena as Query-Engine
- AWS Redshift as Data Platform
- GCP BigQuery as Data Platform
- GCP Cloud Storage as Storage for Data Products
- Azure Synapse Analytics as Data Platform
- Azure ADLS as Storage for Data Products
- Snowflake as Data Platform
- Databricks as Data Platform
- Presto as On-Premise Query-Engine
- MinIO as On-Premise Storage for Data Products