Three Tokens Is Sometimes Enough
Ask any LLM to write something vaguely familiar. For example, ask it to "write a fib". Every model I've tried this with responds with a fibonacci sequence algorithm, even though to most people outside of programming a "fib" is just a small unimportant lie. That second, and arguably more common, meaning never even surfaces. The model has been fine tuned on so much code that the path is already chosen for you before you finish typing.
That is what compression looks like. Three tokens, and the model lands on a specific outcome with surprising confidence. You did not need to say "write me a function in Python that prints the fibonacci sequence up to N" (14 tokens). The right two or three words were enough. The rest of the explanation would not have improved the result much.
Most of the time when people write longer prompts, the extra length is expression of doubt. You are not sure the model will land where you want, so you keep explaining. Then you explain the explanation. The model already has its bias. Adding paragraphs around it does not move it much. If you cannot say what you want in plain short terms, you probably have not figured out what you want yet.
Getting the words right is only part of the story though. You can write something tight and precise and still watch the agent go sideways. It loops on a failing step or builds on top of an incorrect subtask. A better written prompt would not have prevented any of it. The agent needed something around it to catch the drift.
That is the harness, i.e., deterministic tests that catch bad outputs early, cycle detection that stops a spiral before it compounds, and checkpoints that fail loudly instead of silently continuing. None of it is glamorous and most of it has nothing to do with any specific model.
Three tokens can carry a request. The harness has to carry the rest.