2026-W20May 11, 20264 min read

One-Shot AI Is Giving Way to Multi-Pass Systems

The most useful AI pattern I saw this week was not a model release. It was a workflow change. Builders are getting better results when they stop asking one model to do everything in one shot and start breaking the work into passes. A small model triages. Another one gathers context. The stronger model makes the call. Then the system reviews its own output before it moves on. That is a lot less glamorous than "one prompt and done", but it feels much closer to how I would actually trust AI inside a real product.

AI codingagentsworkflow

One big prompt is starting to look like the expensive option

One of the better Claude Code threads this week described a team that cut spend by moving work away from one always-on premium model. The expensive model was not reading logs or raw failure data anymore. A cheaper model handled triage first, filtered out duplicates, and only escalated what actually needed deeper reasoning. That is such an important mindset shift because it reframes intelligence as something you route carefully instead of spraying across every step.

I saw the same question pop up from the Codex side too. Someone was asking how detailed a plan needs to be for a smaller executor model. That is a practical question, and I think it matters more than another round of model tribalism. If builders are spending time figuring out how to package work for cheaper models, that tells me the stack is maturing. The conversation is moving from "which model wins" to "which task deserves which model."

Cheap models should read. Stronger models should decide

The tiered routing example stuck with me because the rule is so clean. Let the cheaper model do the repetitive reading, sorting, and fetching. Let the stronger model handle judgment, architecture, and tradeoffs. That division does not just save money. It also protects the expensive model from drowning in raw context it never needed to read in the first place.

If I were setting this up for my own work, I would apply that far beyond CI failures. File discovery, log slicing, doc summarization, schema lookup, and test result clustering all feel like cheap-model jobs. The bigger model should show up when the work becomes ambiguous, risky, or architectural. That is where I actually want the stronger reasoning budget.

Waves beat chaos

Another Claude Code post came from a non-developer who used the tool to build a content engine and custom MCP setup. The part worth stealing was not the SEO angle. It was the sequencing. Research agents ran first. Writing came second. Page generation came after that. The builder was very explicit that combining research and writing in the same agent doubled the context cost for no good reason.

That lines up with what I keep seeing in practice. Parallelism helps when the jobs are cleanly separated. It hurts when one agent is trying to gather, decide, and produce at the same time. A wave-based system is slower in theory than firing everything at once, but it often feels faster in reality because less gets repeated and less context gets dragged around unnecessarily.

Self-review is quietly becoming part of the product

The r/singularity thread about GPT-Image-2 reviewing and iterating on its own output before returning the result made this pattern feel even bigger than coding. Different domain, same direction. Quality is increasingly coming from internal loops, not just the first answer. That is a useful builder lesson because it nudges me away from treating latency as the only metric that matters.

Sometimes the right question is not "can this finish in one pass?" but "which outputs are important enough to deserve a second pass?" Hero images probably do. Pricing logic definitely does. Migration scripts absolutely do. Internal review loops add cost and time, so I would not use them everywhere, but for the expensive mistakes they are starting to look like a very sensible trade.

Why this was worth writing about

I wrote this one down because it feels like a real shift in how builders are using AI. The useful setups are getting less theatrical. They look more like small production systems with routing, stages, and quality checks. That is much more interesting to me than another claim that one new model can suddenly do every job.

From where I sit behind Applikeable, this matters because it is actionable now. I do not need to wait for a perfect agent. I can break work into clearer passes, keep the heavy model focused on judgment, and spend more of the budget where mistakes are actually expensive. That feels like the kind of boring rule that keeps paying off after the hype cycle moves on.

Threads behind this post

r/ClaudeCode

Switched from Claude Sonnet to Opus and costs went down - the tiered routing architecture is why

r/ClaudeCode

I'm not a dev. I had Claude Code teach me programmatic SEO, cron, and how to build an MCP. 3 weeks in, my dead blog has 2,400+ AI citations.

r/codex

How granular do the plan have to be for a smaller model?

r/singularity

GPT-Image-2 now reviews its own output and iterates until it is satisfied with the correctness of its output.