Skip to content
Vijay Work Resume Blog Contact

Project case study

Automatic engine selection for Kyuubi

Kyuubi engine routing improvement based on user access patterns.

Made shared compute usage more practical by routing users toward the engine mode that best matched their access pattern.

Apache Kyuubi Java Spark Trino

Context

The problem

Shared compute users needed better routing between interactive and batch engine modes without manual selection becoming a recurring source of friction.

Worked directly in the Kyuubi codebase to improve engine selection behavior across Spark and Trino-backed use cases.

System trace

How the work moved through the system

A high-level operating path: where the request starts, how the system shapes it, and how other teams consume the result.

  1. 1

    User group context informed the engine selection path.

  2. 2

    Interactive and batch modes remained separate execution patterns behind a cleaner routing layer.

  3. 3

    Spark and Trino-backed use cases remained part of the platform context.

Routing model

AD-group aware

Engine selection used user group context to route workloads toward the right execution mode.

Primary outcome

Less manual routing

Reduced friction for shared compute users by aligning engine behavior with access pattern.

Architecture

System shape

3
  1. 1 User group context informed the engine selection path.
  2. 2 Interactive and batch modes remained separate execution patterns behind a cleaner routing layer.
  3. 3 Spark and Trino-backed use cases remained part of the platform context.

Ownership

What I handled

3
  1. 1 Read and modified Kyuubi internals instead of solving the problem only at the wrapper layer.
  2. 2 Mapped access pattern requirements into deterministic engine selection behavior.
  3. 3 Validated the change against shared platform usage expectations.

Lessons

What carried forward

2
  1. 1 Platform ergonomics often improve most when defaults match real user behavior.
  2. 2 Small routing decisions can have large operational impact in shared compute environments.

Engineering decisions

Patch the platform layer

The change belonged close to engine selection, where routing could stay consistent across users.

Use existing identity context

AD group membership was already meaningful operational data, so it became part of routing logic.

What can be shown

Public evidence without internal names

The internal systems stay private. This section keeps the public parts: my role, system boundaries, technology context, scale, decisions, constraints, and what I learned.

Internal enterprise system High-level architecture Open-source reference

Routing basis

Group-aware

The shareable signal is the routing pattern, not the specific enterprise group names or policies.

Architecture shape

  • Identity or group context informs engine routing decisions.
  • Interactive and batch execution modes remain separated behind the routing layer.
  • Spark and Trino-backed workloads remain part of the shared compute environment.

Responsibilities

  • Modified Kyuubi engine-selection behavior close to the platform layer.
  • Mapped access patterns into deterministic routing behavior.
  • Kept enterprise identity details out of the site description.

Constraints

  • AD group names, private repository references, internal patches, and deployment rules are confidential.
  • Site notes describe only the generalized routing pattern.

Supporting context

High-level architecture

High-level routing diagram

Can be shown as user context feeding an engine selector that routes toward interactive or batch engines.

Open-source reference

Kyuubi platform context

Kyuubi is a public open-source system; this employer-specific change stays described at the pattern level unless the patch becomes public.

Related case studies

Continue through related work or return to the full project index.

Related projects

Continue in the same area

Project index

Spark + Apache Kyuubi + Backend engineering

Browsing-log analytics and safe-browsing pipelines

Built browsing-log ingestion and analytics pipelines for safe-browsing classification, audience management, cohort creation, and pattern-based downstream data products.

Apache Kyuubi + Trino + Platform engineering

Self-service data platform and governance architecture

Built data-mesh platform capabilities around Kyuubi, custom engine routing, RBAC, secrets management, Trino query access, dbt transformations, DataHub metadata, and Metabase BI.

Java + Spark + Backend engineering

Hive metastore synchronization and metadata governance

Designed and built services that keep Hive metadata consistent across independent environments using real-time listener sync, daily reconciliation, expiry cleanup, one-time interval jobs, observability, and deployment hardening.