LLMs keep eating the scaffolding around them.
Scaffolds do not disappear because they were stupid. They disappear because they were right, and this keeps happening in the same pattern: first we discover some awkward but useful wrapper around the model, then people build products and workflows around it, and then the model providers absorb the pattern into the platform.
My bet is simple:
- this will keep happening
- context management is next
The short history of scaffolds
Hi everyone, it’s 2023, we caught lightning in a box and now the box can kind of speak.
Oh wow, it can spit out JSON? Nice. Let’s build parsers and maybe a little startup around structured outputs while we’re at it.
Then we realize, wait, maybe the model should not just emit text, maybe it should interface with code. Beautiful. Function calling. Another scaffold, another layer around the model because the model alone was not enough.
Then we hit another wall. The thing is smart, but also impulsive, half correct with full confidence. So now we beg it to think step by step, breathe deeply, touch grass, and please for the love of god think before answering. Chain-of-thought enters the chat.
Then tools explode and agents are suddenly everywhere. Give the model a browser, a shell, a database, Slack, Linear, Notion, god himself…
Then a reality check: tool definitions bloat the context window.
So the scaffolds evolve. Fewer dumb tool calls, less structure, and more pushing useful patterns inward into the model and the serving stack. Enter Claude Code.
Not every scaffold survives in the same shape, but the useful ones get eaten by the models.
Context is king (for now)
Today, models are often very good at local reasoning and much weaker at long-horizon operations.
Give them a clean objective, a tight working set, and a manageable number of tools, and they can move fast. Dump in stale logs, giant tool schemas, noisy retrieval, irrelevant memory, or ten thousand lines of “maybe useful” context, and the whole thing degrades. It gets slower, more expensive, less reliable… let’s hit /compact yet again.
This should not be controversial. Long-context models are getting better, but “having a large context window” is not the same as “using all of that context well”, and relevance is still the hard part.
That is why the best engineers hand-pick the context they give agents today. Deciding what matters right now is hard, and most failures in agent systems are not failures of raw intelligence so much as failures of selection. The system had access to the answer somewhere, but it was buried under junk or drowned in verbose outputs.
This gets even more obvious with tools. Everyone loves tool calling until the model has fifty tools, each with thick schemas, plus giant responses that get shoved straight back into the window, and then the agent spends a meaningful portion of its budget, and frankly your company’s budget, just wandering around.
So yes, there is a real category here: the model needs the right context.
BTW, MCP is not really a context protocol
MCP is useful, and it solved a real problem, but its name is misleading.
In practice, a lot of MCP usage turned into exposing a huge pile of tools up front, feeding definitions into the context, letting the model pick from the pile, then passing intermediate results back through the model again and again. Anthropic says that “Most MCP clients load all tool definitions upfront directly into context”, and Cloudflare makes the same point from the other side, writing that “every tool added fills the model’s context window”.
So when I say MCP is not really a context protocol, that is what I mean. It is closer to Model Connectivity Protocol than context protocol.
What scaffolds should we build for context?
The obvious answer is not “more retrieval”, it is better selection.
The useful scaffolds are the ones that reduce cognitive clutter for the model:
- rank context before loading it
- search tools before exposing them
- compress aggressively, but only after preserving task-critical facts
- separate durable memory from transient task state
- make intermediate computation happen outside the main model when possible
- pass forward only what the next step actually needs
This is also why I think the next good context products will look less like giant memory dumpsters and more like selective context routers. The system should not just remember more, it should forget better, fetch better, and stage information better.
That is also the idea behind Contextrie, which I have been building as a context scaffold.
The shape is simple: ingest sources, assess their relevance to the task, then compose the final context at the right density. In other words, do not just keep appending history, rather actually curate what the model sees.
What the future model looks like
I do not think the end state is one giant monolithic model with infinite graceful attention over everything.
The more likely direction is a split system, where a top-level model handles orchestration and work/code, while smaller workers handle retrieval, and tool execution, and maybe even different slices of reasoning. Some of those workers may not even look like chat models in the product sense, they may just be trained subsystems for ranking, compression, memory selection, and evidence assembly.
This is already the shape that many agent systems (Codex with GPT-5.4 and Opus 4.6).
So the future is probably not “the model gets rid of context management”, rather the future is that the model stack becomes context management.
That is what I mean when I say LLMs keep eating the scaffolding. First the scaffolds live outside the model, then they prove they matter, and then they get absorbed into the system, where they become less visible and more essential.
Context is next.
Building (or fine-tuning) a Context-aware Agent will be fun (: