Agent-driven content at scale: lessons from 12 months of production
We've been running AI agent-driven content production in client environments for 12 months. Here's what actually works — and what we got wrong early.
Twelve months ago, we started running AI agents in production for client content programmes. Not experiments — actual client work, under SLAs, with brand standards to maintain.
The results have been good. Output is consistently 8–12× what a same-sized human team produces. Brand voice compliance, measured by a separate model, runs at 94% or above for all clients.
But we got some things wrong. Here's an honest account of the lessons.
What we got wrong: prompting for quality
Early on, we treated the AI agent like a powerful search engine — give it a topic, get an article. The output was technically correct and completely generic. It read like it had been written to pass a Turing test, not to change a buyer's mind.
The fix was adding a brand voice model to the pipeline. We trained a small classifier on each client's best-performing content, and used it to score and filter AI-generated drafts before they reached human editors. The step added 40 minutes to the workflow and improved output quality by roughly 3× on brand-relevant dimensions.
What we got wrong: human review timing
Our first workflow had human editors reviewing full drafts. The problem: by the time a draft was complete, the structural problems — wrong angle, wrong audience, wrong intent level — were expensive to fix.
We moved human review to the brief stage, not the draft stage. Editors now spend 20 minutes reviewing and improving the brief before the agent drafts. The drafts come back cleaner, and the editing step drops from 3 hours to 45 minutes.
What works: the separation of concerns
The pattern that consistently works is a clear separation of what the AI does and what humans do. AI handles: research synthesis, initial draft, formatting, internal link suggestions, metadata. Humans handle: strategic angle, unique insight, brand voice refinement, final judgment.
Neither side tries to do the other's job. The AI isn't making strategic calls. The human isn't formatting H2 tags. Both are working in the territory where they're actually good.
The 12-month result
Clients running the full agent-driven programme are producing 40+ pieces per month with teams of 3–4 people. Organic traffic is up an average of 180% across those programmes. More importantly, the content is actually generating pipeline — because it's structured around buyer intent, not keyword clusters.