ETL – C4: Container, Code, Cloud & Context

Data Quality for AI: Ensuring High-Quality Training Data

Posted on December 5, 2025 by Nithin Mohan TK 13 min read

Data quality determines AI model performance. After managing data quality for 100+ AI projects, I’ve learned what matters. Here’s the complete guide to ensuring high-quality training data. Figure 1: Data Quality Framework Why Data Quality Matters Data quality directly impacts model performance: Accuracy: Poor data leads to poor predictions Bias: Biased data creates biased models […]

Building the Modern Data Stack: How Spark, Kafka, and dbt Transformed Data Engineering

Posted on June 22, 2025 by Nithin Mohan TK 6 min read

The data engineering landscape has undergone a fundamental transformation over the past decade. What once required massive Hadoop clusters has evolved into a sophisticated ecosystem of specialized tools: Kafka for ingestion, Spark for processing, and dbt for transformation. Modern Data Stack Architecture The Paradigm Shift: Monolithic → Modular The old approach centered around monolithic platforms […]

Tips and Tricks – Apply Strangler Fig Pattern for Legacy Migration

Posted on March 15, 2011 by Nithin Mohan TK 11 min read

Gradually replace legacy systems by routing traffic to new implementations incrementally.

Searching in

Tag: ETL

Data Quality for AI: Ensuring High-Quality Training Data

Building the Modern Data Stack: How Spark, Kafka, and dbt Transformed Data Engineering

Tips and Tricks – Apply Strangler Fig Pattern for Legacy Migration