back to reflections

Skills Are an Expensive Form of RAG

Skills, MCP, and RAG all solve the same problem. Skills just do it with the worst context economics. If smaller models are the future of agents, a skills-first architecture is already a dead end.
Petko D. Petkovon a break from CISO duties, building cbk.ai

Let me play devil's advocate. Skills, MCP tools, and RAG are the same technique with different packaging. An agent is given access to some set of external information, it decides what it needs, it pulls it in, it acts on it. The mechanics under the hood are practically identical.

Skills took off because they are trivial to write. A markdown file ships faster than a tool schema and a web service, and faster still than a vector database. That explains adoption.

Look at how skills work at runtime. The names and descriptions of every available skill are preloaded into the context. The agent scans that list and decides which skill file to read. Then the full file is loaded into the context as well. This is CAG, cache-augmented generation, dressed up as a new idea. You pre-commit tokens to an index you may or may not use, and then pay again to pull the content.

MCP has the same problem. Load twenty servers and your context is half full before the user has typed anything. The common workaround is a meta MCP that searches for the right tool and exposes only that. Which, if you squint, is just RAG with function calls.

We are running in circles. Each "new" approach rediscovers that we need selective retrieval over an index. That is literally what RAG is.

Now the cost! Every iteration of the agentic loop recomputes over whatever is in context. Skill definitions stay there. Tool definitions stay there. Every turn is more expensive than the last, and that cost compounds. It gets worse as the skill library grows. For frontier models with enormous context windows, it is "tolerable" (context-wise, cost is eye-watering still). For smaller models, which I expect to become the default runtime for most agents, it is fatal. Context is tight, attention is tight, and context bloat directly degrades reasoning.

A constrained RAG pipeline on GPT-3.5 Turbo with a 16K context window did the same job years ago. It retrieved the right passage, handed it to the model, and got out of the way. No preloaded catalog of every possible action. No context tax on every turn. Targeted lookup, nothing else.

None of this means skill files should go away. They are an excellent distribution format. A markdown file with an example is easier to move around than a vector database or a running MCP server. The packaging is fine. The runtime model can be much more efficient.

Im my mind the pattern that wins is painfully obvious and simple. Keep the shareable, human-readable artifacts. Index them in a small retrieval layer. Retrieve on demand. Do not preload summaries of things the agent may never use. Do not pay context rent on tools that are irrelevant to this turn.

Skills feel like progress because they are easy to write. They are not cheap to run. Once models get smaller and agentic loops tighter, the architecture that wins is the one that keeps the context clean.


A practical note: this is one of the reasons ChatBotKit does not lock you into a single retrieval style. Datasets, skills, and MCP servers can all feed the same agent, so you get to pick the architecture that actually fits the model you are running. A small model on a tight context budget should not pay the same context tax as a frontier model with room to spare, and on ChatBotKit it does not have to.