The sub-agent bubble is quietly deflating

Twelve months ago, the architectural conversation in AI products was almost entirely about orchestration: how to route intent, how to coordinate agents, how to get specialised models talking to each other. That conversation is starting to feel like it was solving the wrong problem.

Sub-agents, as a default design pattern, were a clever solution to a temporary constraint. The constraint was that models weren't capable enough to hold context, route intent, and apply deep domain knowledge all at once. So we solved it by decomposing: a planner agent, a retrieval agent, a calculation agent, a summarisation agent. Each one small, each one scoped, each one dumb enough to stay reliable.

That decomposition made sense. For about eighteen months.

The models got better faster than the frameworks assumed they would. And a different pattern is quietly winning in its place: one that doesn't route work to a separate agent at all. One that bakes the capability directly into the experience, at the moment the user needs it, using something that looks a lot more like a skill than a sub-agent.

The more interesting question isn't whether sub-agents are over. It's what should replace them as the default, and when a sub-agent is still genuinely the right call. Answering that properly requires being honest about what each pattern is actually good for.

What a sub-agent actually is

Let me be precise, because the terminology is a mess.

A sub-agent, in the architecture sense I'm using here, is a model invocation that operates with its own context window, its own tools, its own instructions, and its own decision-making loop. The orchestrator decides something is out of scope and punts it to a specialist. The specialist does its thing and returns a result. The orchestrator carries on.

The appeal is obvious. You get specialisation. You get separation of concerns. You get something that looks like a software engineering org chart but with LLMs instead of engineers.

The problems are also obvious, once you've shipped one.

Latency compounds across hops. Context gets mangled in translation between agents; what the orchestrator passes to the sub-agent is one of the most consequential decisions in the whole system, and it's usually a late design consideration, not an early one. When something goes wrong in a multi-agent pipeline, debugging it is an exercise in tracing provenance across context windows you weren't watching. And coherence suffers. The user's original intent, perfectly understood at the top of the task, gets progressively diluted as it passes through each hand-off. By the time the result comes back, it often completes a slightly different task than the one that was asked for.

I've built these systems. The debugging is not pleasant. The coherence degradation is real.

Each hop adds latency and loses fidelity. The original intent, perfectly understood at the top, gets progressively diluted as it passes through each hand-off.

What a skill actually is

A skill, in the sense I mean here, is not a metaphor.

It's a pre-packaged unit of capability: a knowledge file encoding domain rules, a set of tool definitions, a fragment of system instructions, and grounding data, all assembled contextually into the primary model's context window when a specific intent shows up. No separate model invocation. No hand-off. The capability is just there, loaded, available, in the task context that's already running.

Think about what this looks like in practice. A user of a financial product needs their quarterly tax position calculated. Instead of routing that task to a Tax Sub-Agent™, the agent loads a tax domain skill: the relevant legislative rules, the right tools to query transaction records, the logic to identify deductible items. It queries the data, applies the rules, and surfaces the result. One context window. One model. The full picture held together.

Or a developer agent where a test suite has just failed. Instead of spinning up a framework-specialist sub-agent, the agent loads a framework skill with the relevant documentation, known failure patterns, and the tools to inspect the failing test output and the code it exercises. It traces the failure, proposes the fix, and can run the tests again to verify. No context lost in transit.

The difference in user experience is not subtle. Skills preserve task coherence. Sub-agents introduce seams.

The right capability assembles into a single context window at the moment it's needed. No separate invocation. No hand-off. The model reasons across the full story in one pass.

The agent is the thing that decides

Here's the part that gets lost when people frame this as "skills vs sub-agents": neither pattern exists without an agent to invoke it.

The agent is the reasoning layer. It's the thing that holds the user's intent, reads the context, decides what capability is needed, and either loads a skill or dispatches to a sub-agent. A well-designed agent is not a dumb router. It has judgment. It knows what it can handle directly and what genuinely needs to be delegated.

The shift I'm arguing for is in how we build that agent's capability surface. If the agent's default tool for handling specialised tasks is "spin up a sub-agent", you've moved your complexity problem rather than solved it. If the agent's default tool is "load a skill that makes me capable of this", you've kept the complexity inside a single coherent decision-making loop.

An agent with well-composed skills is not a less powerful agent. It's a more capable one. The skills extend what the model can reason over without fracturing the thing that actually produces quality answers: an unbroken context window that knows the full story.

The agent pattern becomes genuinely compelling exactly when skills are doing the heavy lifting. An agent that can dynamically load the right capability for the right moment (tax rules when it's doing financial work, inventory logic when it's managing a reorder, customer history when it's handling a relationship decision) and act on all of them within the same task context, with full awareness of everything it has already done? That's the agentic experience worth building. Not a dispatcher. A reasoner with access to a growing set of lenses.

When sub-agents still make sense

I want to be careful here, because the real argument isn't "sub-agents are bad". It's "sub-agents are over-used as a default". There are cases where they're genuinely the right call, and conflating the two would be sloppy.

Genuine parallelism. If a task requires running three independent workloads at the same time and aggregating the results, parallel sub-agents are correct. Research synthesis across multiple sources, running simultaneous code execution threads, comparing independent analyses: these are genuinely parallel jobs. An agent that serialises them in a single context window to avoid spawning sub-agents is slower for no benefit.

Long-horizon tasks that exceed any single context. A multi-day autonomous workflow (writing and running tests, filing a pull request, monitoring a deployment, responding to feedback) can't fit in one context window even with large context models. Here you need something that looks like handoff: a durable state layer, checkpoints, and the ability to resume. Sub-agents with explicit shared memory are the right architecture. A skill loaded into a single task session isn't.

Strict isolation requirements. Some tasks need to be sequestered: sensitive data that shouldn't be in a general-purpose context, tool access that should be scoped down for safety reasons, or audit requirements that demand a clean separation of concerns. A sub-agent with tightly controlled permissions is the right answer here. It's not about capability, it's about boundary.

Recursive decomposition. When the problem itself is fractal, where solving the top-level task requires solving multiple sub-tasks of comparable complexity, which themselves require sub-tasks, hierarchical agent decomposition is a natural fit. Code refactoring at scale. Document generation across a large corpus. Anything where the work genuinely branches.

The honest test is this: am I spawning a sub-agent because the task genuinely requires it (parallelism, horizon, isolation, recursion), or because I've modelled my uncertainty about what the model can do into my architecture? If it's the second reason, load a skill instead.

The uncomfortable truth is that most of the sub-agents being built today fail that test. They exist because someone wasn't sure the main model could handle a domain, and a sub-agent felt like a safer bet than investing in a well-composed skill. That was a reasonable call eighteen months ago. It's a less reasonable call today.

If you can't point to a concrete reason from this list, a well-composed skill will serve the user better than a sub-agent.

The shift is from orchestration to composition

Here's the deeper thing that's changing, and I think this is where the architectural intuition needs to update.

The sub-agent model is fundamentally about orchestration: a system that decides what to do and dispatches work to other systems. The skill model is about composition: a system that assembles the right capabilities and uses them together in a single coherent pass.

Orchestration requires coordination overhead. Every hop is a potential failure mode. Every hand-off is a potential information loss. Every sub-agent boundary is a design decision with downstream consequences for latency, coherence, and debuggability.

Composition doesn't route. It assembles. The model gets the right context, the right tools, the right instructions, and it reasons across all of them in one pass. The complexity is in how you define and load the skills, not in how you coordinate agents that are passing notes to each other.

In Stride MCP, my running coach side project, the version I originally built had something close to the sub-agent smell: separate retrieval logic for Strava data, a separate analysis layer, a separate planning module. When I rebuilt it as an MCP server with well-defined tools that the model could reach for directly, the experience got measurably better. Not because MCP is magic. Because a single context window holding real training data, real intent, and real tools produces a more coherent answer than a pipeline that forgets something in the hand-off.

The coherence problem is the real problem. Sub-agents solve for capability specialisation. Skills solve for coherence while delivering capability. Those are different bets on what matters most to users.

What this actually means for builders

If you're building an AI product right now, the practical question is where to put the investment.

Building another sub-agent orchestration layer is a bet that the routing problem is the hard problem. Building a richer skill library (well-defined domain knowledge, clean tool interfaces, grounded instructions that activate at the right moment) is a bet that the capability problem is the hard problem.

My bet is the second one. Not because orchestration is never needed, but because most of the agentic experiences I've seen that feel genuinely good to users are the ones where the agent clearly knows things and can do things in the moment, not the ones where the agent is impressive at deciding who else to call.

Skills extend an agent's knowing. Sub-agents extend an agent's reach. For most of the experiences people actually want to have with an AI product, knowing matters more than reach.

Build agents that know more. Reach for the sub-agent when the job genuinely can't be done any other way.