January 2025 – Page 3 – C4: Container, Code, Cloud & Context

Edge AI with ONNX Runtime: Running Models On-Device

Posted on January 10, 2025 by Nithin Mohan TK 6 min read

Last year, I deployed an AI model to a mobile device. The first attempt failed—the model was too large, inference was too slow, and battery drain was unacceptable. After optimizing 15+ models for edge deployment using ONNX Runtime, I’ve learned what works. Here’s the complete guide to running AI models on-device with ONNX Runtime. Figure […]

Read more →

Vector Database Comparison: Pinecone vs Weaviate vs Qdrant vs Chroma – Choosing the Right One for Your RAG Application

Posted on January 9, 2025 by Nithin Mohan TK 4 min read

Last March, a 3AM alert changed everything. Our Pinecone bill had tripled overnight, and I spent the next three months migrating between vector databases, learning hard lessons about what actually matters. Let me share what I discovered—and what I wish someone had told me. Figure 1: Comprehensive comparison of vector database options The Night Everything […]

Read more →

LLM Response Streaming: Building Real-Time AI Experiences

Posted on January 8, 2025 by Nithin Mohan TK 13 min read

Introduction: Streaming LLM responses transforms the user experience from waiting for complete responses to seeing text appear in real-time, dramatically improving perceived latency. Instead of staring at a loading spinner for 5-10 seconds, users see the first tokens within milliseconds and can start reading while generation continues. But implementing streaming properly involves more than just […]

Read more →

Embracing the DevSecOps Landscape in Azure: A Comprehensive Guide

Posted on January 7, 2025 by Nithin Mohan TK 5 min read

Introduction The world of software development is continuously evolving, and one of the key drivers of this evolution is the need for speed, agility, and security. The DevSecOps approach is gaining traction, as it integrates security practices into the DevOps pipeline, ensuring that applications are developed and deployed in a secure and compliant manner. Microsoft […]

Read more →

Azure Functions and Serverless Architecture: A Solutions Architect’s Guide to Event-Driven Computing

Posted on January 5, 2025 by Nithin Mohan TK 5 min read

After two decades of building enterprise applications, I’ve witnessed the evolution from monolithic deployments to microservices, and now to serverless architectures. Azure Functions represents a fundamental shift in how we think about compute—moving from “always-on” infrastructure to truly event-driven, pay-per-execution models. This transformation isn’t just about cost savings; it’s about building systems that scale automatically […]

Read more →

LLM Fallback Strategies: Building Reliable AI Applications (Part 2 of 2)

Posted on January 3, 2025 by Nithin Mohan TK 13 min read

Introduction: LLM APIs fail. Rate limits hit, services go down, models return errors, and responses sometimes don’t meet quality thresholds. Building reliable AI applications requires robust fallback strategies that gracefully handle these failures without degrading user experience. A well-designed fallback system tries alternative models, implements retry logic with exponential backoff, caches successful responses, and provides […]

Read more →

Searching in

Month: January 2025

Edge AI with ONNX Runtime: Running Models On-Device

LLM Response Streaming: Building Real-Time AI Experiences

Embracing the DevSecOps Landscape in Azure: A Comprehensive Guide

LLM Fallback Strategies: Building Reliable AI Applications (Part 2 of 2)