Where AI Coding Agents Go to Die
Coding agents are crazy useful for a ton of stuff, but they absolutely faceplant on problems that seem basic but are actually kind of convoluted. Like, they'll help you architect a brand new backend system, but then get stuck in these weird circular loops on what looks like simple UI stuff.
I was trying to figure out a good litmus test for this - something that could reliably expose these failure modes - and I think I found one example.
Build an Auto-Scrolling Component
You know those chat interfaces that automatically scroll to the bottom when new messages come in? Try getting an AI agent to build one of those properly. I'm talking about a real implementation, not the hacky version. Something that works greeat on desktop works ok on mobile (mobile has its limits).
I've tried this multiple times with different architectures, different prompts and different models, and every single time the agent gets stuck in circular loops. It'll suggest a fix, realize it breaks something else, suggest another fix, and just... spiral.
The thing is, the problem sounds basic. "Just scroll to the bottom when there's new content, right?" But the actual solution is anything but trivial.
What I was trying to do was basically simulate the overflow-anchor
CSS property but extend it further - adding stop positions where the scroll would automatically pause once reached. And this is where it gets complex.
See, while the problem is contained within a single system (the browser), it requires juggling multiple event systems that all fire at different times:
- Scroll handlers
- Intersection observers
- Resize observers
- Content mutation observers
All of these events need to coordinate, and the state management has to stay consistent across all of them. One event fires, changes the state, triggers another event, which needs to check the state, and suddenly you're in this web of timing issues and race conditions.
The Hypothesis
My theory is that AI agents struggle here because they can't easily hold the entire event flow in their "head" at once. They'll solve for one condition, but forget about another. They'll add a fix that works for Case A but breaks Case B. The problem requires thinking in terms of the entire event system simultaneously, not just solving discrete sub-problems.
How It Actually Got Solved
Eventually I got it working with some help from AI - you can actually test it on the chatbotkit.com widget element - but this came after many iterations. Like, way more than you'd expect for something that seems this straightforward.
This auto-scroll test might be a good baseline for evaluating how well AI agents handle problems that require coordinating multiple systems with complex state management. If agents struggle with this (a topic that is well documented) surely they will fail understand more complex problems that could involve many external systems with many intricate dependencies.
If you're building or testing AI coding agents, maybe give this a shot and see how yours do. I'd be curious to know if other people see the same failure patterns.
Anyway, that's my brain dump on this. More to come as I keep poking at it.