The Eval Harness I Use to Ship AI Agents Without Fear
Updated for 2026. The eval harness I use to ship AI agents — a graded test set, an LLM judge, and a regression gate so a prompt change can't break prod.
367 posts on growth, marketing, sales, processes, business, SEO, GEO, and AI agents — written from the operator's seat.
Updated for 2026. The eval harness I use to ship AI agents — a graded test set, an LLM judge, and a regression gate so a prompt change can't break prod.
Updated for 2026. I automated my entire newsletter pipeline — idea to send — with a Claude agent and the Kit API. No more staring at a blank editor on send day. Here's the exact setup.
Updated for 2026. You probably already have the content AI engines want to cite. The work is restructuring, not writing. Here's the exact process I use to get existing posts cited by ChatGPT and Perplexity.
Updated for 2026. I replaced my ads manager with a Claude skill that reads performance data, rewrites copy, and creates new ad sets — all from a single command. Here's the exact code.
Updated for 2026. Not a roundup of every AI tool. These are the 5 I use every day to run a consulting brand and a pickleball facility — with specific use cases and honest caveats.
Updated for 2026. Most AI agents that work in demos fail in production for the same five reasons. Here's how to diagnose each failure mode and what I've done to fix them across 30+ production agents.
Updated for 2026. No framework, no Python, no course required. Here's the exact TypeScript code and Cloudflare deployment steps to ship your first working AI agent today.
Updated for 2026. Event-triggered agents fire on webhooks for latency-sensitive work. Scheduled agents run on cron for batch jobs. Here's how to pick—with real Cloudflare Worker examples.
Updated for 2026. How brick-and-mortar businesses get cited by ChatGPT and Perplexity. Google Business Profile, LocalBusiness schema, NAP consistency, and review velocity — explained with Pickleland as a live example.
Updated for 2026. Most operators never eval their AI agents — they just assume they work. Here's the practical eval framework I use across 30+ production agents: golden sets, pass/fail criteria, and weekly spot-checks.
Updated for 2026. ChatGPT citations aren't random — they follow patterns. Here's the operator playbook for getting your brand surfaced in AI-generated answers.
Updated for 2026. One TypeScript agent fans out a single EN post to 12 locales using the Claude API. Learn the system prompt, cost breakdown, and real SEO traffic results.