When Exploit Generation Becomes a Commodity
The discourse around LLM-generated exploits typically fixates on the wrong question. We obsess over whether models can find and exploit zero-days, treating this as some kind of threshold moment where security fundamentally changes. But the real shift isn't in the discovery. It's in the economics. Sean Heelan demonstrated LLM agents successfully exploiting a zero-day vulnerability in QuickJS for roughly $50 worth of compute. This proves that exploit generation has become a commodity, purchasable by the token rather than by the specialist. Security researchers debate the technical merits while others worry about script kiddies with credit cards. We are missing the bigger picture. Democratizing hacking or lowering barriers to entry misses the point - those barriers were already low for motivated attackers. What's changing is the scale at which security research can operate, and that has implications that cut both ways in unexpected patterns.
Industrialization in this context means something specific. A single security researcher with access to frontier LLMs can now run hundreds of parallel exploit attempts across different targets, iterating through mitigation bypasses at speeds that would require an entire team of human researchers. Defenders get the same capability, but the asymmetry of attack versus defense hasn't changed. An attacker needs one working exploit. A defender needs to patch every vulnerability. When both sides get 100x more productive, the attacker's advantage compounds rather than neutralizes. Heelan's hardest challenge demonstrates this perfectly. GPT-5.2 chained seven function calls through glibc's exit handler mechanism to bypass CFI, shadow stacks, and seccomp sandboxing. The sophistication isn't in novel techniques, as most of these approaches already exist in the wild, but in the systematic exploration of the solution space. The model didn't need to know the trick beforehand. It just needed enough tokens to try enough combinations.
Heelan's experiments used QuickJS because it's an order of magnitude simpler than production browser engines, and that simplicity made the results interpretable and reproducible. But this doesn't mean the approach won't scale to complex targets - it means we don't yet know the token budget required. The more interesting question is about the shape of the problem space. Exploit generation against known vulnerability classes, even in complex codebases, is fundamentally a search problem with verifiable solutions. You either get code execution or you don't. The models demonstrated they can navigate this space given enough compute. What remains unclear is whether the relationship between target complexity and required tokens is linear, polynomial, or exponential. If it's linear or polynomial, we're already at the inflection point. If it's exponential, we might have a few more years before this becomes routine against hardened targets like Chrome or the Linux kernel.
The automation of exploit generation exposes a fundamental asymmetry in how we think about software security versus AI security. We've spent decades building defense in depth - ASLR, DEP, CFI, shadow stacks, sandboxing - creating layers of mitigation that make exploitation harder but never impossible. Each new protection mechanism gets studied, understood, and eventually bypassed through careful analysis and clever technique combination. LLMs are simply accelerating this natural process. Yet when we talk about AI safety and security, we seem to expect a different paradigm entirely. We want models that can't be jailbroken, can't generate harmful content, can't be misused - a level of absolute security we never achieved or even pursued in traditional software. Both domains share the same fundamental dynamic. Sufficiently motivated adversaries with enough resources will find ways through your defenses. What LLM-powered exploit generation teaches us isn't that AI makes everything worse. The game was always about economics and scale, not absolute security.
There's a tempting but misguided conclusion to draw from this work. If LLMs can generate exploits, then maybe all those years of learning assembly, studying memory corruption, and understanding CPU architecture were wasted effort. This gets it exactly backwards. The models aren't replacing expertise. They're compressing the time from insight to implementation. A skilled security researcher who understands the fundamentals can now multiply their effectiveness by using LLMs to handle the tedious parts - setting up environments, writing boilerplate, trying variations. The vulnerability discovery agent that found the QuickJS zero-day didn't replace Heelan. It extended his capability to explore more targets than he could manually audit. The real question isn't whether security expertise becomes obsolete, but whether we're ready for a world where both attackers and defenders operate at 100x their current scale. When exploit generation costs $50 instead of weeks of expert time, what does that mean for the economics of bug bounties, vulnerability disclosure, and security research funding? When defensive tools can fuzz and validate at comparable speeds, do we finally reach an equilibrium, or does the asymmetry just shift to whoever can afford the most compute? These aren't hypothetical concerns. They're the questions we need to answer in the next 12-24 months as these capabilities become table stakes rather than research demonstrations.