I've built 12+ AI agent systems across development, DevOps, and data operations. Here's why the current hype around autonomous agents is mathematically impossible and what actually works in production.
The real challenge isn’t AI capabilities, it’s designing tools and feedback systems that agents can actually use effectively.
I don’t understand how and to what extent these agents include the code repository in their context. Like I am assuming that including the whole repository should be a prerequisite to allow the agent to create a proper solution to a non-trivial problem. But what about the dependencies? Do they just take a chance that the LLM is able to guess how to use them properly? Plus so many big companies have a culture of massive monorepos. I suppose there are islands of isolated code within them but still that’s another barrier.
Besides this I think “AI capabilities” are just big a challenge if not bigger than the tooling. AI capabilities are surprisingly good but they still habitually fail in critical scenarios. And it’s not just about them not being intelligent enough. Every improvement in model quality is accompanied by increase in compute costs. Infinite VC funding is softening this blow but this cannot probably go on forever.
Generally, the agent will use the whole repo. The workflow is that you get them to make the changes to satisfy the query, then run tests, and use the feedback from the tests to iterate. They’re getting surprisingly good at this nowadays for common tasks.
I don’t understand how and to what extent these agents include the code repository in their context. Like I am assuming that including the whole repository should be a prerequisite to allow the agent to create a proper solution to a non-trivial problem. But what about the dependencies? Do they just take a chance that the LLM is able to guess how to use them properly? Plus so many big companies have a culture of massive monorepos. I suppose there are islands of isolated code within them but still that’s another barrier.
Besides this I think “AI capabilities” are just big a challenge if not bigger than the tooling. AI capabilities are surprisingly good but they still habitually fail in critical scenarios. And it’s not just about them not being intelligent enough. Every improvement in model quality is accompanied by increase in compute costs. Infinite VC funding is softening this blow but this cannot probably go on forever.
Generally, the agent will use the whole repo. The workflow is that you get them to make the changes to satisfy the query, then run tests, and use the feedback from the tests to iterate. They’re getting surprisingly good at this nowadays for common tasks.