Skip to content

Latest commit

 

History

History
43 lines (30 loc) · 3.04 KB

README.md

File metadata and controls

43 lines (30 loc) · 3.04 KB

Data Mesh

The purpose of this directory is to serve as a template for implementing a Data Mesh platform. Initially, we'll be taking a single-repo approach, so that we can easily scale up the various implementations.

The idea is, when relevant and useful, to allow some of these tools to move to their own repository to serve other purposes (for example, a tool like NiFiKop is relevant in our platform but not only, and as it can serve a community we might as well release the tool).

Code organization

Our aim here is to implement a Data Mesh platform, so we're going to follow the organization driven by this approach.

So we have the following folder organization

Domains

A domains folder containing all the domains of the Data Mesh, which are divided into three types of domains:

  • Source aligned domain: Analytical data reflecting the business facts generated by the operational systems. This is also called a native data product. That domains are responsible for providing the truths of their business domains as source-aligned domain data.
  • Aggregate domain: Analytical data that is an aggregate of multiple upstream domains.
  • Consumer aligned domain: Analytical data transformed to fit the needs of one or multiple specific use cases. This is also called fit-for-purpose domain data.

Platform

A platform folder, containing all the implementations of our Data Mesh Platform, subdivided into the 3 planes:

  • Infrastructure utility plane: manages the low-level infrastructure resources essential to building and running the mesh, such as storage, compute and identity systems. We will use some cloud provider operators (such as GCP config connector, AWS Controllers for Kubernetes) and implement our own if necessary, as well as some terraform modules.
  • Product experience plane: higher-level abstraction to build, maintain, and consume Data Products. Built using the Infrastructure utility plane, its interface directly with a Data Product. It will primarily define the interface between the data product teams and the infrastructure, through a folder organization and manifest to define the data product.
  • Mesh experience plane: abstracts the mesh-level capabilities operating on multiple Data Products.

Each one of this plane having:

  • A set of products that are packaged either to be used internally or for their clients:
    • Helm packages
    • Terraform modules
    • Kubernetes operators
    • Python library
  • A set of infrastructure stage (<stage_layer>_<stage_name>), that is provision and managed by the platform teams. These correspond to the provisioning of components required to enable domains teams to deploy their Data Products, some examples:
    • Account networking configuration
    • Kubernetes cluster
    • Kubernetes Operator deployment
    • CI/CD solution
    • Central storage layer (storage object solution, Redshift cluster etc.)
    • Mesh experiences tools like Data Catalog etc.