Deep Dives
-

Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments
Agentic AIA 12-metric evaluation framework for production AI agents — covering retrieval, generation, agent behavior, and…
19 min read -

When semantic search isn’t enough for the RAG
16 min read -

“Should we process our data in batches or in real-time?” It’s not batch vs. stream:…
14 min read -

A practitioner’s argument that meeting summarizers fail in the same way regressions fail when you…
15 min read -

Three weeks into testing, a learner told me my AI tutor gave her the wrong…
24 min read -

A practitioner’s guide to causal attribution when two churn drivers arrive at once.
14 min read -

A practical guide to modern type annotations in Python for data science
18 min read -

Exploring the inner workings of a decoder-only Transformer foundation model
14 min read -

Part 1: The basics — discretization of time, censoring and the life table
11 min read -

Part 2. Building scale-invariant agents that seamlessly change contexts
11 min read