The Multi-Agent Coding "Todd Chavez" Problem, and how to fix it (hint: Agile)

Justin Rich

14 Oct 2025 — 12 min read

TL;DR: Multi-agent AI coding has the same coordination problems as human teams. The solution? Treat your AI agents like developers—give them sprint planning, standup logs, and TDD. I've been building a production app this way and it's 2x faster with zero agent conflicts.

AI agents and the red hooded slacker

If you've watched BoJack Horseman, you know Todd Chavez. Lovable guy. Full of ideas. Absolutely zero follow-through or coordination with anyone else. He starts projects, forgets about them, starts new ones, and occasionally creates chaos by not knowing what anyone else is doing.

That's multi-agent AI development without structure.

Look, I need to tell you about what I've been learning while building an app with three AI agents, and how I discovered that coordinating artificial intelligence is exactly—and I mean exactly—like trying to get Todd Chavez to work on a group project. Except there are three Todds, they all have computer science degrees, and none of them sleep.

The Problem: We're Relearning Software Engineering's Hardest Lesson (Again)

So Claude Code launched this multi-agent feature (here's the docs if you want to read something that won't make you question your career choices), and everyone got really excited. "Multiple AI agents!" they said. "They'll work together!" they said. "It'll be fine!" they said.

Narrator: It was not fine.

Here's what actually happened: I fired up multiple agents thinking I was about to experience some kind of silicon valley fever dream where code just... writes itself in perfect harmony. Instead, I got what can only be described as a very expensive demonstration of why we invented Agile methodology in the first place.

Agent A would implement authentication. Agent B would also implement authentication, but differently, because Agent B didn't know Agent A existed. Agent C would break both implementations because it was working on something else entirely and had no idea what A and B were doing. It was like watching three very smart people try to assemble IKEA furniture in separate rooms while blindfolded.

And here's the thing that really got me: This is exactly why we have standup meetings.

I'm not kidding. The reason your manager makes you say "yesterday I did X, today I'm doing Y" isn't because management is bored—it's because without that forcing function, developers (human or artificial) will absolutely, 100%, without fail, step on each other's toes until someone rage-quits or the codebase catches fire.

We've been here before. We solved this problem. With humans. And now we're doing it again with AI, except the AI never complains about the standup meeting being too early, which is somehow both refreshing and deeply unsettling.

My Journey: Multiple Sprints and a Realization

So there I was, building this app—doesn't matter what it does, what matters is it's complex enough to require multiple specialized domains—and I kept hitting the same problem.

Sprint 1: Agents working great individually, complete disaster collectively.

Sprint 2: "Maybe I just need better prompts?"

Sprint 3: "Okay so better prompts didn't work, maybe I need MORE agents?"

Sprint 4: [screaming internally]

You know that moment in every software engineering career where you realize you're fighting against fundamental laws of the universe? Like when you first learn about the CAP theorem and realize you can't have your cake and eat it too, even though that's literally the entire point of cake?

This was that moment, but for coordination.

And then—and I know this sounds like I'm setting up some kind of thought leadership flex, but I promise I'm not that organized—I had what alcoholics call a "moment of clarity" and what engineers call "reading the documentation we should have read six months ago."

The documentation in question? Every Agile and Scrum book ever written.

Because here's the thing: We already know how to coordinate multiple intelligent agents. We've been doing it with humans since at least the 1990s, possibly earlier if you count ancient Roman construction projects, which I do because those aqueducts didn't build themselves.

So I did something crazy: I treated my AI agents like a software engineering team.

Sprint planning. Task breakdowns. Standup logs. Acceptance criteria. The whole nine yards. Everything I'd do if I were managing three human developers, except I didn't have to worry about PTO requests or someone microwaving fish in the breakroom.

Result?

~2x velocity improvement over my previous chaos
Way better test coverage (like, actually caring about it now)
Zero duplicated work
Zero conflicting architectural decisions
Zero agents rage-quitting (they can't, which is nice)

The lesson? Software engineering principles still matter. Even when the engineers are transformers. The kind made of attention mechanisms, not the kind that turn into trucks.

The Solution: Agile for AI Agents (Because History Doesn't Repeat, But It Does Rhyme)

Okay so here's the structure I landed on, and I'm going to explain it like you're a human who's ever worked on a software team, because you are, and you have:

The Three-Layer Architecture

Layer 1: The Specification Layer This is your sprint backlog. Its like your issue tracker board but in markdown because like the rest of us AI hates JIRA. These documents become your "oh god we need to ship something" planning intel:

Product Requirements Document (PRD)
Implementation Plan (14 sprints, because apparently I'm an optimist)
Individual sprint specs with acceptance criteria
Task files that break down exactly what needs to happen

It's the same thing you'd create for human developers, in fact if you have time for extra cycles you can ask your agents to weigh in and critique your plan (trust me, they're just as opinionated as humans).

I start every project by creating a spec folder in my root directory and working with GPT to define a PRD with explicit scope—what we will and won't build. This PRD becomes a constant reference point throughout development. Anytime an agent or developer considers a new feature or direction, they check against the PRD's scope section first.

This prevents scope creep and keeps everyone from drifting like Todd from BoJack Horseman—pivoting from a focused documentary into an unfocused multisensory gift basket experience. The PRD isn't written once and forgotten; it's the North Star that gets referenced continuously to maintain alignment.

Layer 2: The Agent Layer This is your dev team. I've got three agents:

UI Developer - React Native, theming, E2E testing. Basically the person who makes things pretty and argues about whether this button should be 16px or 18px.
Backend Engineer - Convex, data architecture, API design. The person who rolls their eyes at the frontend but secretly enjoys the problem-solving.
Infrastructure Engineer - CI/CD, monitoring, deployment. The person who knows where all the bodies are buried and gets paged at 3am.

Just like a real team, except they all work 24/7 and never ask for raises.

Layer 3: The Coordination Layer This is where the magic happens, and by magic I mean "the thing that prevents everything from exploding."

The core innovation? Standup logs.

Stay with me here. Remember daily standup meetings? Where everyone goes around and says what they did yesterday, what they're doing today, and what's blocking them? And how everyone pretends to listen but is actually just waiting for their turn to talk?

Well, AI agents are actually really good at reading the standup logs. Like, suspiciously good. They don't tune out. They don't check their phones. They just... read the whole thing and understand the context.

Every agent session starts with: "Read the standup log. Understand what everyone else is doing. Don't step on their toes. Proceed."

It's a mandatory boot sequence. Like the safety briefing on an airplane, except the agents actually pay attention instead of pretending they know where the emergency exits are.

Real Example: Sprint 02 Authentication (AKA: The One Where It Actually Worked)

Let me show you how this plays out in practice, because I know you're thinking "sure, sure, this is all very theoretical, but does it actually prevent the chaos?"

The Goal: Implement WorkOS authentication with role-based access control. You know, the thing every B2B SaaS app needs but everyone underestimates the complexity of until they're three weeks in and crying about OAuth flows.

Planning Phase: I broke it into 8 acceptance criteria and 6 discrete tasks. Like a normal sprint planning session, except I didn't have to order lunch or pretend to care about team bonding activities.

Day 1: Backend Engineer

Reads standup log: "Oh, I'm starting fresh on Sprint 02"
Implements authentication helpers
Writes tests (more on this later)
Updates standup log: "Finished auth helpers, here's how they work, here's what UI Developer needs to know"

Day 2: Backend Engineer (Different Session)

Reads standup log: "Oh right, I did auth helpers yesterday"
Builds RBAC with hierarchical scopes
Makes architectural decision: Direct WorkOS API reads for MVP (no caching)
Documents decision in standup log with rationale

Day 3: UI Developer

Reads standup log: "Backend's done! Here are the helpers, here's the API, here are the architectural decisions"
Builds auth screens that integrate perfectly with backend
No conflicts. No confusion. No duplicate work.

Result: All 8 acceptance criteria met. Zero conflicts. Seamless handoff.

What would have happened without coordination?

Oh man. Okay so:

Backend Engineer implements auth one way
UI Developer, not knowing this, makes different assumptions
Backend Engineer (in a new session) forgets the first implementation and does it differently
UI Developer's screens break
Everything catches fire
I cry
My users end up with a forever "Coming Soon" page

It's the same chaos you get when you have three developers who don't talk to each other. Because—say it with me now—alignment is hard for everyone, including AI.

The Three Things That Make This Work

1. Boot Sequences (Mandatory Context Loading)

Every agent profile starts with this:

⚠️ BOOT SEQUENCE - Execute Immediately When Invoked

When you @mention me, I will IMMEDIATELY:
1. Read Agent Rules
2. Read Domain-Specific Rules  
3. Read Current Sprint Standup Log
4. Orient: Status, incomplete criteria, next actions
5. Proceed

Is this overkill? Maybe. Does it prevent agents from starting fresh every session like some kind of AI-powered memento? Absolutely.

It's like making your team check Jira before starting work. Except the AI actually does it, which is already better than half my previous teams. (Sorry, previous teams. You know it's true.)

2. Test-Driven Development (The Contract Between Agents)

Here's where I'm going to sound like a zealot, but I promise this is important:

No code without tests. No failing tests allowed. Ever.

Why? Because tests are how Agent A tells Agent B "this is what I built, this is how it works, please don't break it."

When UI Developer writes tests for the auth screen, Backend Engineer can see exactly what the UI expects. When Backend Engineer writes tests for the RBAC system, UI Developer knows the contract.

It's like documentation, except it's enforced by CI/CD, which means people (and agents) actually keep it updated.

I'll write a whole article about this later (teaser: "TDD for Multi-Agent Systems: Or How I Learned to Stop Worrying and Love the Test Coverage"), but the short version is: Tests are the peace treaty that prevents agent warfare.

3. Specialization (Clear Role Boundaries)

Remember how I have three agents with specific domains?

UI Developer: React Native, theming, E2E tests
Backend Engineer: Convex, data architecture, API design
Infrastructure Engineer: CI/CD, monitoring, deployment

This isn't just organizational tidiness. This is the same reason your company has a "frontend team" and a "backend team" instead of everyone just doing everything.

Specialization means:

Agents get really good at their domain
They don't step on each other's code
There's clear ownership of problems
Context stays focused

Could I just have one mega-agent that does everything? Sure. Could you have one mega-developer who does frontend, backend, infrastructure, design, and also makes the coffee? Technically yes, but they'd burn out in three weeks and you'd deserve it.

What I Learned (The Short Version, Because I Know You're Skimming)

What Works:

Treating agents like team members, not magic code machines
Mandatory coordination protocols (boot sequences, standup logs)
TDD as the contract between agents
Specialization over "just make the AI figure it out"

What Doesn't Work:

"Just prompting better" without structure (I tried, it failed)
Skipping coordination "just this once" (chaos, every time)
Assuming GPT-5 or whatever will solve coordination problems (it won't)
Treating standup logs as write-once documentation (they're living documents, read them)

The Meta-Lesson:

Alignment is always hard. Process beats raw capability. Software engineering wisdom applies to AI development.

We're not inventing new solutions—we're applying the ones we already know work. And honestly? That's kind of comforting. At least one thing in the AI space makes sense.

The Bigger Picture: Why This Matters

Here's what really gets me about this whole experience:

Everyone's talking about prompt engineering like it's the skill of the future. And sure, prompting is important. But you know what else is important? The same coordination skills we've needed since the dawn of software engineering.

Think about it:

We spent decades learning how to coordinate human developers
We invented Agile, Scrum, Kanban, all these methodologies
We wrote books about it. Held conferences. Created certifications.
And now we're doing it again with AI

The problems are identical:

How do you prevent duplicate work?
How do you maintain context across sessions?
How do you handle dependencies?
How do you ensure everyone's working toward the same goal?

The solutions are also identical:

Clear specifications
Regular check-ins (standup logs)
Well-defined roles
Test-driven development
Documentation

Intelligence doesn't solve coordination problems. This is true for humans, and it's true for AI. You can have the smartest people (or models) in the world, but if they don't know what each other are doing, you get chaos.

It's the same reason brilliant researchers still need lab meetings. The same reason genius founders still need co-founder communication. The same reason the Avengers need to coordinate even though they're all literally superheroes.

Coordination is hard. For everyone. Including AI.

And honestly? We're better at this than we think. Humans have been coordinating with other intelligent agents (humans) for millennia. The principles transfer directly. We don't need to reinvent the wheel—we just need to apply what we already know.

When to Use This Approach (And When Not To)

This makes sense when:

You're building something serious (10+ sprints)
You need multiple specialized domains (frontend, backend, infrastructure)
Quality and test coverage matter
You're tired of agents breaking each other's code

This is overkill when:

You're writing a script that'll run once
You're just exploring an idea
The whole project is like 3 tasks
You have unclear requirements (figure those out first)

The coordination tax:

Setup: 1 hour for your first sprint structure
Ongoing: ~15 minutes per session for standup log updates
Payoff: Saves 5-10 hours per week in rework and debugging

Is it worth it? Depends on whether coordination complexity exceeds setup cost. For my 14-sprint project? Absolutely. For your weekend prototype? Probably not.

What's Next: The Deep Dive Series

Look, this article is already longer than I planned (I have a problem, I know), but there's so much more to cover:

Future articles I'm thinking about:

📊 "The Standup Log Protocol" - How to write logs that actually prevent chaos instead of just creating more documentation no one reads
🧪 "TDD for Multi-Agent Systems" - Why tests are the peace treaty that prevents agent warfare
🎯 "Designing Agent Profiles" - How to split domains so agents don't fight over territory like developers arguing about microservice boundaries
📋 "Sprint Planning for AI" - Breaking down requirements so agents know what "done" means
🔄 "Context Management as Daily Reflection" - What happens when agents hit the context limit mid-sprint (and how to recover without crying)

I'll also share templates, examples, and probably more stories about things that went hilariously wrong before I figured this out.

Conclusion: Same Problems, Same Solutions

Here's the thing that keeps me up at night (besides caffeine and imposter syndrome):

We spent decades learning how to coordinate humans. We created entire methodologies. We wrote books. We had conferences. We argued endlessly about whether to use Jira or Linear or that weird thing with the post-it notes.

And now we're doing it all again with AI.

But the lessons are the same. Alignment is hard. Process matters. Structure enables intelligence instead of replacing it.

The irony isn't lost on me: I'm building AI systems using human coordination techniques, and discovering that the human coordination techniques were... actually pretty good? Maybe? At least better than chaos?

So here's my advice: Next time you spin up multiple AI agents, pretend you're managing a small software team. Because you are. They just happen to be made of attention mechanisms instead of anxiety and coffee.

Give them clear roles. Make them coordinate. Require tests. Document decisions. Do standups. (Well, standup logs, but same principle.)

Will it feel weird treating AI like team members? Yes. Will it work better than treating them like magic code generators? Also yes.

Because coordination is hard. For everyone. Including AI. And maybe—just maybe—that's okay. We already know how to do it.

Now if you'll excuse me, I need to go update my standup log.

Try it yourself:

Start with one sprint, 2-3 agents
Implement standup logs and boot sequences
See if the coordination problems feel familiar
Apply the same solutions you'd use for human teams

And let me know how it goes. Seriously. I want to know if I'm the only one who had this weird revelation or if this is like... a thing.

Resources:

Claude Code Documentation
My sprint templates and agent profiles
That one Agile book everyone pretends they read