ETL – Page 2 – C4: Container, Code, Cloud & Context

Data Pipelines for LLM Training: Building Production ETL Systems

Posted on November 8, 2025 by Nithin Mohan TK 13 min read

Building production ETL pipelines for LLM training is complex. After building pipelines processing 100TB+ of data, I’ve learned what works. Here’s the complete guide to building production data pipelines for LLM training. Figure 1: LLM Training Data Pipeline Architecture Why Production ETL Matters for LLM Training LLM training requires massive amounts of clean, processed data: […]

Read more →

Tips and Tricks – Use Span for Zero-Allocation String Parsing

Posted on October 20, 2025 by Nithin Mohan TK 10 min read

Eliminate heap allocations when parsing strings by using Span for memory-efficient operations.

Read more →

Building the Modern Data Stack: How Spark, Kafka, and dbt Transformed Data Engineering

Posted on June 22, 2025 by Nithin Mohan TK 6 min read

The data engineering landscape has undergone a fundamental transformation over the past decade. What once required massive Hadoop clusters has evolved into a sophisticated ecosystem of specialized tools: Kafka for ingestion, Spark for processing, and dbt for transformation. Modern Data Stack Architecture The Paradigm Shift: Monolithic → Modular The old approach centered around monolithic platforms […]

Read more →

Azure Data Factory: A Solutions Architect’s Guide to Enterprise Data Integration

Posted on February 16, 2025 by Nithin Mohan TK 6 min read

Enterprise data integration has evolved from simple ETL batch jobs to sophisticated orchestration platforms that handle diverse data sources, complex transformations, and real-time processing requirements. Azure Data Factory represents Microsoft’s cloud-native answer to these challenges, providing a fully managed data integration service that scales from simple copy operations to enterprise-grade data pipelines. Having designed and […]

Read more →

Tips and Tricks – Implement Domain Events for Loose Coupling

Posted on November 14, 2024 by Nithin Mohan TK 10 min read

Use domain events to decouple components and enable reactive architectures.

Read more →

Tips and Tricks – Apply Strangler Fig Pattern for Legacy Migration

Posted on March 15, 2011 by Nithin Mohan TK 11 min read

Gradually replace legacy systems by routing traffic to new implementations incrementally.

Read more →

Searching in

Tag: ETL

Data Pipelines for LLM Training: Building Production ETL Systems

Tips and Tricks – Use Span for Zero-Allocation String Parsing

Building the Modern Data Stack: How Spark, Kafka, and dbt Transformed Data Engineering

Tips and Tricks – Implement Domain Events for Loose Coupling

Tips and Tricks – Apply Strangler Fig Pattern for Legacy Migration