How People Burn So Many Tokens
I keep reading stories about teams burning through staggering amounts of tokens on coding tasks, and I genuinely don't recognise the problem. We use coding assistants internally every day - 8-12 hours consistently. We use them by hand while programming, and we also run them in our CI/CD pipelines, where they write and fix code on their own overnight while we sleep. The software comes out more solid in the morning. The bills stay small. So I have been trying to work out where the difference comes from, and I have a few ideas I would like to share that hardly seem to be mentioned anywhere else.
The first is the kind of problem you hand the agent. If you give it an open-ended task and it will wander, try things, second-guess itself ("thinking"), and the meter runs the whole time. Give it something specific and targeted and it knows what to do. I think that's obvious. Most of the cost is in the wandering, and most of the wandering comes from a vague ask. The vague ask come from a lack of understanding of the problem, and that is a human problem, not an agent problem. The issue is exacerbated by the fact that many developers these days do not understand the problem they are trying to solve because they are not writing the code, they are prompting for it. They have no idea how to solve the problem, so they ask the agent to solve it, and then they ask the agent to solve it again when it doesn't get it right the first time, and then they ask it again, and so on. The more you understand the problem, the more you can give the agent a clear and specific task, and the less you have to pay for wandering.
The second is our agent infrastructure, which is surprisingly thin. Compared to what I see people sharing online, we have very few skills (~10), a tight AGENT.md, and small supporting files. For the size of what we run internally, with many SDKs and moving parts, the layer that explains how things are done is light. The agent spends its budget on the task instead of reading a manual about the task. There are no PRDs, no design docs, no architecture docs, no onboarding docs, no style guides, no coding standards. We have a few of those things but they are not part of the agent's world. The agent's world is the code and the task. That is where the tokens go. Why would I want to explain the coding style to the agent when it can read the code and imitate it? Why would I want to explain the architecture when the agent can read the code and understand it? Why would I want to explain the design when the agent can read the code and see it? The agent is a reader, not a listener. It works by producing the most likely next token, so it is best to give it the most likely next token to read. The more you have to explain, the more tokens you burn on explanation instead of work.
The third is how the code is laid out on disk. A lot of models, GPT-5.5 among them in particular, love to scatter hundreds of tiny files everywhere. That hurts in two ways. A single change now spans several files, and in most agents one tool call cannot edit across files, so you pay for call after call to make one edit. Then the context for that change has to be pulled back together from all those files, each one read separately. Internally we go the other way. We keep a high concentration of code in single files, some of them well past fifteen thousand lines. People assume that is inefficient. It is the opposite. It is efficient for tokens and efficient for the developer. In fact I chuckle when I see eslint rules that restrict file sizes to a few hundred lines. That comes from a human perspective and it uniformly hurts both humans and agents. The more you can keep together, the more the agent can read in one go, and the more it can change in one go. The more you have to break things apart, the more you have to pay for putting them back together.
The last one is structure. Once software is stable and has a good shape, new work behaves like a plugin. It slots into what already exists, the conventions decide most of the choices, and there is little left to figure out. A brand new project has none of that. Nothing holds it up, every decision is open, and so every decision costs tokens. The more you build on a foundation, the cheaper each addition gets. Creating an agentic solution on top of ChatBotKit SDK is 50 lines of code in your favourite language. Creating it on top of a blank slate is millions of lines of code that are never written in a straight line. The cost in tokens and salaries is measured in the millions. The more you have to build, the more you have to pay.
None of this is exotic. Make sure you use targeted tasks, lean instructions, code that lives together, and a structure worth building on. Do those and the tokens take care of themselves.