The Agent Stack I Use to Run 30+ Production Agents (No Python)

Alejandro Rioja

June 20, 2026 6 min read

TL;DR

I run 30+ production AI agents using TypeScript, Cloudflare Workers/Queues/KV, and Claude models — no Python, no agent framework. The stack is boring on purpose: Workers handle scheduling and queuing, KV stores state, and the Anthropic SDK drives the model calls directly. The constraint that matters is not the AI layer — it's the infrastructure around it.

Free newsletter

Every Wednesday. 28,400+ operators. Zero fluff.

Open Table of contents

Why no Python
The core infrastructure: three Cloudflare primitives
The model layer: Anthropic SDK, two models
A real agent: the content pipeline
A real agent: the event promoter
How I manage 30+ agents without losing my mind
What I’d change if I were starting today
The operator’s bottom line

Why no Python

The honest answer: I write TypeScript every day for my website and product work. Adding a second language for agents means two runtimes, two dependency trees, two deploy pipelines. The productivity cost isn’t theoretical — I’ve paid it on past projects and decided not to again.

The second reason is Cloudflare. Workers run TypeScript natively at the edge, with Queues, KV, Durable Objects, and Cron Triggers built in. The entire agent infrastructure I need — scheduling, state, async job processing — is one wrangler deploy away. There is no Python equivalent of that with the same operational surface area.

The third reason is that most “Python-is-better-for-AI” arguments are really “Python has more ML libraries.” I don’t train models. I call APIs. The Anthropic SDK is first-class TypeScript. LangChain and its cousins are complexity I don’t want. When you’re shipping agents, not researching them, simplicity wins.

The core infrastructure: three Cloudflare primitives

Every agent I run touches at least one of these three:

Cloudflare Workers — the compute layer. A Worker is the agent’s runtime: it receives a trigger (cron, queue message, HTTP), runs the model call(s), and writes outputs somewhere. Cold start is under 5ms. Execution limit is 30 seconds CPU time on the free plan, 15 minutes on paid. Almost everything I build fits in 30 seconds; the ones that don’t use Queues to fan out.

Cloudflare Queues — async job processing. When a task might take longer than a request, or when I need to fan out (generate 12 translations in parallel), I push messages onto a Queue and let bound consumers process them independently. No polling, no setTimeout hacks.

Cloudflare KV — lightweight state. Agent run history, last-processed timestamps, cached API responses. KV is eventually consistent, which is fine for agents — I’m not running transactions. It gives me a dead-simple key-value store I can read/write from any Worker without spinning up a database.

The model layer: Anthropic SDK, two models

I use exactly two Claude models:

claude-sonnet-4-6 — for tasks that need real reasoning: writing blog posts, analyzing event data, generating social copy, planning sequences
claude-haiku-4-5 — for fast/cheap classification, routing decisions, short extractions where full reasoning is overkill

The Anthropic SDK in TypeScript is straightforward. Here’s the pattern I use for every model call:

typescript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: env.ANTHROPIC_API_KEY });

async function runAgent(prompt: string, systemPrompt: string): Promise<string> {
  const message = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    system: systemPrompt,
    messages: [{ role: "user", content: prompt }],
  });

  const block = message.content[0];
  if (block.type !== "text") throw new Error("Unexpected content type");
  return block.text;
}

That’s the whole model interface. No abstractions on top. When I need tool use, I add a tools array. When I need streaming, I swap messages.create for messages.stream. There is no framework managing this for me — and I don’t want one.

A real agent: the content pipeline

The most complex agent I run is the content pipeline. It generates blog posts, translates them into 12 languages, renders OG card SVGs, and drafts LinkedIn promos — all as drafts, gated behind my review before anything publishes.

The Worker entry point looks like this:

typescript

// src/workers/content-pipeline.ts
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { topic, slug } = await request.json<{ topic: string; slug: string }>();

    // Step 1: generate EN post
    const enPost = await generatePost(topic, env);
    await env.CONTENT_KV.put(`draft:${slug}:en`, enPost);

    // Step 2: fan out translations via Queue
    const locales = ["ar", "de", "es", "fr", "hi", "it", "ja", "ko", "nl", "pt", "ru", "zh"];
    for (const locale of locales) {
      await env.TRANSLATION_QUEUE.send({ slug, locale, content: enPost });
    }

    return Response.json({ status: "queued", slug });
  },
};

The Queue consumer handles each translation independently:

typescript

// src/workers/translation-consumer.ts
export default {
  async queue(batch: MessageBatch<TranslationJob>, env: Env): Promise<void> {
    for (const message of batch.messages) {
      const { slug, locale, content } = message.body;
      const translated = await translatePost(content, locale, env);
      await env.CONTENT_KV.put(`draft:${slug}:${locale}`, translated);
      message.ack();
    }
  },
};

Each translation runs in its own Worker invocation. If one fails, the Queue retries it automatically. I get 12 parallel translations without managing threads, promises, or rate-limit backoff myself.

A real agent: the event promoter

Pickleland runs pickleball events. I built an agent that scans the booking platform for events in the next 4 days, drafts Facebook group posts per event, and surfaces them for my review before anything goes out.

The agent calls a scraping Worker, passes the event list to Claude with a structured prompt, and writes the draft posts to KV. The prompt is explicit about tone (community-focused, not salesy) and format (one post per event, under 150 words, include the booking link).

typescript

const systemPrompt = `You are a community manager for a pickleball facility.
Write Facebook group posts for upcoming events.
Rules:
- Max 150 words per post
- Lead with what's fun about the event, not the price
- Include the booking URL exactly as provided
- Do not use exclamation marks more than once per post
- Tone: friendly, local, not corporate`;

The constraint that matters here isn’t the model — it’s the workflow. The agent runs on a cron trigger at 8am daily. The draft posts land in a review queue. I approve or edit, then a separate publish Worker fires. No event gets posted without a human seeing it first.

How I manage 30+ agents without losing my mind

The honest answer: Cloudflare’s dashboard is my control plane. Every Worker shows me invocation count, error rate, and CPU time. Every Queue shows message throughput and failures. KV shows storage usage.

Beyond that:

Every agent logs a structured JSON object at the end of each run: { agent, status, durationMs, inputTokens, outputTokens, costUsd }
I track cumulative spend per agent per month in a simple Airtable base
Agents that exceed a cost threshold get flagged for review — usually means a prompt is too verbose or I’m using Sonnet where Haiku would do

The discipline isn’t technical. It’s deciding what an agent is allowed to do autonomously versus what needs my sign-off. Content drafts: autonomous. Anything that touches a customer: human review. Anything that sends money: not an agent job.

What I’d change if I were starting today

One thing: I’d set up structured outputs (JSON mode) from day one instead of retrofitting it onto agents that already shipped. Parsing free-text Claude output is a tax. When you define a Zod schema and pass it as the expected response shape, you get typed data back and your downstream Workers don’t have to guess.

typescript

import { z } from "zod";

const EventPostSchema = z.object({
  headline: z.string().max(80),
  body: z.string().max(600),
  bookingUrl: z.string().url(),
  suggestedPostTime: z.enum(["morning", "afternoon", "evening"]),
});

Then I pass the schema definition to Claude as a tool, use tool_choice: { type: "tool", name: "format_post" }, and get structured output back every time. No regex, no “sometimes Claude adds a preamble” bugs.

The operator’s bottom line

The agent stack that works in production is the one you can debug at 10pm when something breaks. For me, that’s TypeScript + Cloudflare + Anthropic SDK — not because it’s the flashiest combination, but because every layer is observable, deployable, and replaceable independently. Frameworks are bets on abstractions. I’d rather own the plumbing.

Want to run AI agents in your business? Get in touch — I design and deploy production agent stacks for operator teams.

Keep reading

AI Agents

How I Built Courtlines: A Club-Management SaaS, Engineered With Claude

The story behind Courtlines, the operating system for racket-sport clubs and studios — why I built it, what it does, and how using Claude as my primary engineering partner let one operator ship a full multi-tenant SaaS.

AI Agents

How I Built Quads, a Mobile Board Game, With Claude — From a 2-Hour Hackathon to the App Store

Quads started as a 2-hour hackathon idea on a trip to Colombia and became a real mobile board game on iOS and Android. Here's exactly how I built it with Claude — parallel agent worktrees, the game AI, offline-first tricks, and the gotchas nobody warns you about.

AI Agents

How to Write AI Agent System Prompts That Don't Fail in Production

Updated for 2026. A practitioner's guide to writing AI agent system prompts that hold up in production — five layers, real examples from 30+ agents, and the maintenance habits that prevent silent drift.

Keep reading

Get the AI playbook in your inbox

Every Wednesday. 28,400+ operators. Zero fluff.

The Agent Stack I Use to Run 30+ Production Agents (No Python)

Table of contents

Why no Python

The core infrastructure: three Cloudflare primitives

The model layer: Anthropic SDK, two models

A real agent: the content pipeline

A real agent: the event promoter

How I manage 30+ agents without losing my mind

What I’d change if I were starting today

The operator’s bottom line

Related posts

How I Built Courtlines: A Club-Management SaaS, Engineered With Claude

How I Built Quads, a Mobile Board Game, With Claude — From a 2-Hour Hackathon to the App Store

How to Write AI Agent System Prompts That Don't Fail in Production

Get the AI playbook in your inbox