Untrusted Context Belongs in a Tool Call
A security firm found that Bunq's AI assistant could be hijacked through a transaction description. Send a two cent transfer with a crafted message in the reference field, wait for the victim to ask their assistant about recent transactions, and the model reads your text as if it were its own instruction. It then turns the bank's own app into a spearphishing channel, asking the customer to reauthenticate.
The story is interesting. It is also not unusual. This happens because most teams putting these systems together do not yet have the experience to do it safely.
The mistake is almost always the same. Content that came from outside gets placed somewhere the model treats as authoritative. Sometimes it lands in the system prompt. More often it lands in a user message, because that is the easy thing to do.
I see this constantly in tools like n8n, where a workflow glues a few nodes together and the retrieved content drops straight into the prompt. The system prompt has the most pull, so people assume that is the only channel worth worrying about. User messages carry real weight too. They have to. If they did not, the assistant would refuse half of what you ask it, which is no way to behave under normal use.
So you have a model that obeys user messages by design, and a pipe that feeds attacker-controlled text into user messages. The injection does not need to be clever. It does not need to say "ignore previous instructions." It needs to look like a reasonable request sitting in a place where requests get followed.
The real fix is a stack of compounding defences, but it starts with one simple move. You do not have to put external content into a user message or the system prompt at all. You can inject it as a tool call result, framed as if the agent itself went and fetched it.
That one change moves the content into a part of the conversation the model reads as data it retrieved. The transaction description becomes something the assistant looked up, rather than something a user told it to do.
This is how we do it at ChatBotKit. There is no other path for external context to enter a conversation. It never touches a user message and it never touches the system prompt. Everything that comes from the outside world arrives as a tool result, on the agent's own turn.
It is not the whole answer. You still constrain what the agent is allowed to do, you still watch what it does at runtime, you still keep the context small. But the channel you choose for untrusted content is the first decision you make, and most of the breaches I read about were lost right there.