Agents Need Supervisors Not Schedulers
Agent orchestration is still an evolving field. It started with chat. A human types, an AI responds, back and forth. The human is in the loop at every turn. Then came semi-autonomous agents - what coding assistants now call "agent mode." You hand over a goal, the AI breaks it into steps and executes. Then sub-agents: a master agent delegates pieces to specialized workers, collects results, synthesizes. More parallel, more scalable. The pattern keeps evolving.
But there is a structural flaw that persists across all of these. The agent doing the work is also the agent judging the work. Whether it is a single agent in a loop or an orchestrator managing sub-agents, creation and evaluation are bundled into the same actor. And that is the weak point.
We know this from everywhere else. Code review exists because the person who wrote the code is the worst person to find its bugs. Separation of duties is a foundational principle in security, finance, and operations. You do not let the same actor create and approve. The blind spots are too predictable. An agent evaluating its own output is biased in the same way a developer is biased toward their own code. It generated the thing. Of course it thinks the thing is good.
The pattern that is missing is not a smarter orchestrator. It is a supervisor. Not a scheduler that assigns work, but an independent agent whose only job is to verify what another agent produced. Two agents, both running autonomously, but with fundamentally different roles. One generates. The other reviews. They do not need to coordinate in real time. The generator produces artifacts. The supervisor inspects them. If the work passes, it moves forward. If it does not, the generator gets feedback and iterates.
This is different from orchestration. An orchestrator has to understand every sub-task well enough to delegate it and every result well enough to judge it. That is a lot of cognitive load for one agent, and it is where multi-agent systems start falling apart. A supervisor does not need to understand how the work was done. It only needs to evaluate the output. Artifacts in, judgments out. The communication surface is tiny.
It also scales differently. Run multiple generators in parallel, each trying a different approach. The supervisor just filters the results. You get the exploration benefits of parallelism without the coordination overhead that kills multi-agent systems.
This is not hypothetical. We are already seeing early versions of it work. Automated code review tools that use a separate model to critique AI-generated pull requests are reporting real results. And anecdotally, many developers have already adopted a version of this pattern on their own: use one model to write the code, then use a different model with a different prompt to find the bugs. It sounds redundant until you try it. There are always bugs to be found. Always. The generator is confident in its output. A fresh set of weights with a different perspective catches what the first one missed. You have to try it to believe it.
The industry keeps reaching for deeper hierarchies when the answer might be lateral. Not more layers of control but a clean split between the agents that build and the agents that verify. The pattern is old. We just have not applied it to AI systematically yet.