Cut Claude Code Tokens 3x by Fixing Backend Context Delivery
Claude Code started using three times fewer tokens after one change:
Before: 10.4M tokens · 10 errors · $9.21
After: 3.7M tokens · 0 errors · $2.81
The reason isn’t the model.
The problem is how the backend delivers information to the agent. When the context is incomplete, a more powerful model doesn’t ignore that gap.
It spends more tokens reasoning about the missing context, runs more exploratory queries, and falls back to retries more often. So the lack of context doesn’t go away when you switch to a more powerful model — it just becomes more expensive.
Here’s a breakdown of why backends become token sinks for agents, what an alternative architecture looks like, and what the cost difference looks like on a real project.
*As the context engineering layer, we used Insforge Skills + CLI (, locally)*