Skip to content
Vijay Work Resume Blog Contact

Project case study

Self-service data platform and governance architecture

Governed self-service data platform for discovery, query access, transformation, and BI.

Created a governed self-service data path where teams could discover data, transform it, query it, and consume BI outputs without bypassing access control or central metadata.

Apache Kyuubi Trino Apache Ranger DataHub Metabase Jupyter dbt Spark Airflow

Context

The problem

Teams needed self-service access to shared data assets, but the platform still had to preserve authorization, secrets, metadata, transformation ownership, and a usable BI layer.

The platform used Apache Kyuubi as the governed access layer with custom changes for engine selection, RBAC, and secrets management; Trino as a query engine; dbt for transformation; DataHub as the central metadata base; and Metabase as the BI layer.

System trace

How the work moved through the system

A high-level operating path: where the request starts, how the system shapes it, and how other teams consume the result.

  1. 1

    Kyuubi served as the governed access gateway, with custom platform changes for engine selection, RBAC, and secrets management.

  2. 2

    Trino handled interactive query execution, while Spark supported heavier processing paths.

  3. 3

    dbt provided transformation structure, DataHub served as the central metadata base, and notebooks supported exploration.

Platform model

Governed self-serve

The platform connected access control, query engines, metadata, transformations, notebooks, and BI instead of treating them as disconnected tools.

Access gateway

Kyuubi customizations

Custom changes covered engine behavior, RBAC, and secrets management around shared query access.

Architecture

System shape

5
  1. 1 Kyuubi served as the governed access gateway, with custom platform changes for engine selection, RBAC, and secrets management.
  2. 2 Trino handled interactive query execution, while Spark supported heavier processing paths.
  3. 3 dbt provided transformation structure, DataHub served as the central metadata base, and notebooks supported exploration.
  4. 4 Metabase provided the BI layer for consuming curated outputs.
  5. 5 Airflow supported recurring platform workflows without being the headline capability.

Ownership

What I handled

4
  1. 1 Extended Kyuubi behavior for engine routing and platform access needs.
  2. 2 Worked on RBAC and secrets-management pieces around governed query access.
  3. 3 Connected Trino, dbt, DataHub, notebooks, and Metabase into a more coherent user path.
  4. 4 Improved discoverability and reusable access patterns for shared data assets.

Lessons

What carried forward

2
  1. 1 Self-service data platforms need access control, metadata, transformation, and BI to be designed together.
  2. 2 The most useful platform work often happens in the boundaries between open-source systems, not inside one tool alone.

Engineering decisions

Use Kyuubi as the governed access layer

Kyuubi was the right place to centralize access behavior because it sits close to users, engines, identity, and workload routing.

Keep metadata central

DataHub became the shared metadata base so discovery and governance did not depend on scattered project knowledge.

Separate transformation from consumption

dbt and Metabase served different parts of the workflow: transformation ownership and BI consumption.

What can be shown

Public evidence without internal names

The internal systems stay private. This section keeps the public parts: my role, system boundaries, technology context, scale, decisions, constraints, and what I learned.

Internal enterprise system High-level architecture Open-source reference

Platform shape

Governed self-serve

The shareable story is the operating model: query access, authorization, secrets, metadata, transformation, and BI connected into one platform path.

Access layer

Kyuubi + Trino

Kyuubi handled governed entry points and engine behavior while Trino provided interactive query execution.

Architecture shape

  • Kyuubi acts as a governed access gateway, with custom behavior for engine selection, RBAC, and secrets management.
  • Trino provides query execution, while Spark supports heavier processing workloads behind the platform.
  • dbt owns transformation patterns, DataHub acts as the central metadata base, and Metabase provides the BI consumption layer.
  • Airflow remains the scheduling/orchestration layer used across platform workflows rather than the main product story.

Responsibilities

  • Built and extended Kyuubi platform behavior for engine selection, RBAC, and secrets management.
  • Connected Trino query access, dbt transformations, DataHub metadata, and Metabase BI into a clearer self-service data path.
  • Improved data discovery, governed access, and reusable analytics patterns across platform consumers.

Constraints

  • Internal datasets, access policies, tenancy boundaries, and deployment topology are confidential.
  • Site notes avoid screenshots or examples that reveal enterprise data assets.

Supporting context

High-level architecture

Governed self-service data platform topology

Can be shown as Kyuubi access gateway, custom engine routing, RBAC, secrets management, Trino query execution, dbt transformations, DataHub metadata, Metabase BI, notebooks, and scheduled platform workflows.

Open-source reference

Open-source technology references

The case study can cite the named open-source projects as technology context without implying ownership of enterprise deployment details.

Related case studies

Continue through related work or return to the full project index.

Related projects

Continue in the same area

Project index

Spark + Airflow + Platform engineering

Browsing-log analytics and safe-browsing pipelines

Built browsing-log ingestion and analytics pipelines for safe-browsing classification, audience management, cohort creation, and pattern-based downstream data products.

Trino + DataHub + Platform engineering

Governed conversational data platform

I architected and built a governed conversational data and visualization agent: it retrieves business knowledge, answers business questions, runs governed queries from that context, reasons over results, and builds charts without making the LLM the data boundary.

Apache Ranger + Trino + Platform engineering

Ranger RBAC and policy-governance extensions

Extended enterprise data access governance around Apache Ranger-based RBAC, an external attribute store, DataHub tag-driven policies, row-level security, masking, Trino integration, audit clarity, and local/containerized development paths.