Skip to content
Vijay Work Resume Blog Contact

Project case study

Telecom network data lake

Petabyte-scale network data lake and multi-level analytics for mobile tower events.

Created the analytics foundation for network data products, reporting, and large-scale downstream consumption.

Flink Kafka Java Elastic Stack Influx Airflow Apache Iceberg Apache Hudi

Context

The problem

Network data from mobile towers needed an analytics foundation that could support reporting, downstream data products, and very high event volume.

Worked across ingestion, storage, multi-level aggregation, processing, and operational concerns for very large mobile tower network-event datasets.

System trace

How the work moved through the system

A high-level operating path: where the request starts, how the system shapes it, and how other teams consume the result.

  1. 1

    Streaming and batch ingestion paths moved mobile tower network events across the platform.

  2. 2

    Flink and Java supported streaming and processing workloads.

  3. 3

    Multi-level aggregation paths reduced raw tower-event volume into analytical outputs.

Event scale

~5T events/day

The mobile tower network-event platform handled petabyte-scale data and about five trillion events per day.

Data scale

Petabyte-scale

The system supported very large telecom datasets across ingestion, storage, and downstream consumption.

Architecture

System shape

5
  1. 1 Streaming and batch ingestion paths moved mobile tower network events across the platform.
  2. 2 Flink and Java supported streaming and processing workloads.
  3. 3 Multi-level aggregation paths reduced raw tower-event volume into analytical outputs.
  4. 4 Table-format experience around Iceberg and Hudi supported the broader data lake skill set.
  5. 5 Airflow, Elastic Stack, and Influx supported orchestration and operational visibility.

Ownership

What I handled

4
  1. 1 Designed data flows for very high-volume mobile tower network events.
  2. 2 Worked on petabyte-scale tower-event data and multi-level aggregation paths.
  3. 3 Implemented platform pieces across ingestion, processing, and operational reliability.
  4. 4 Created a foundation for downstream reporting and data product consumption.

Lessons

What carried forward

2
  1. 1 Large data systems fail at the boundaries unless ownership and observability are explicit.
  2. 2 Scale claims are only useful when tied to the operational systems that made them sustainable.

Engineering decisions

Treat operations as a first-class requirement

At this scale, pipeline correctness and visibility mattered as much as raw throughput.

Separate ingestion, processing, and consumption concerns

Clear system boundaries helped the platform support downstream analytics and products.

What can be shown

Public evidence without internal names

The internal systems stay private. This section keeps the public parts: my role, system boundaries, technology context, scale, decisions, constraints, and what I learned.

Internal enterprise system High-level architecture Scale signal

Event scale

~5T events/day

The mobile tower network-event platform handled petabyte-scale data and about five trillion events per day.

Data scale

Petabyte-scale

Shareable description stays at scale category level and avoids internal capacity details.

Architecture shape

  • Mobile tower network events enter ingestion paths through streaming and batch movement.
  • Flink and Java services process high-volume streams and derived datasets.
  • Multi-level aggregation paths reduce raw tower-event volume into analytical outputs before downstream consumers use the data.
  • Data lake table-format experience includes Iceberg/Hudi-style foundations for large analytical datasets.
  • Airflow, Elastic Stack, and Influx support orchestration and operational visibility.

Responsibilities

  • Designed and implemented ingestion, processing, storage, and reliability pieces.
  • Worked across pipeline boundaries needed for downstream reporting and data products.
  • Kept network topology and operational thresholds confidential.

Constraints

  • Network topology, tower identifiers, vendor details, and operational thresholds are confidential.
  • Site notes present scale as rounded signals only.

Supporting context

High-level architecture

High-level telecom data lake shape

Can be shown as ingestion, streaming, storage, orchestration, observability, and downstream consumption layers.

Scale signal

Scale signal

About five trillion events per day and petabyte-scale are acceptable shareable indicators without internal topology.

Related case studies

Continue through related work or return to the full project index.

Related projects

Continue in the same area

Project index

Flink + Kafka + Backend engineering

Point-of-interest proximity streaming pipeline

Built a real-time proximity pipeline that joined customer location events with points of interest so users could receive relevant offers when they came within roughly a one-kilometer radius.

Elastic Stack + Kafka + Backend engineering

Internal observability platform

Built an in-house alerting and monitoring framework around Elastic Stack, Kafka, and custom services.

Java + Airflow + Backend engineering

Hive metastore synchronization and metadata governance

Designed and built services that keep Hive metadata consistent across independent environments using real-time listener sync, daily reconciliation, expiry cleanup, one-time interval jobs, observability, and deployment hardening.