AnyAi.fyi - Discover ANY AI to make more online for less.

venturebeat

ACE prevents context collapse with ‘evolving playbooks’ for self-improving AI agents

A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automatically populates and modifies the context window of large language model (LLM) applications by treating it as an “evolving playbook” that creates and refines strategies as the agent gains experience in its environment.ACE is designed to overcome key limitations of other context-engineering frameworks, preventing the model’s context from degrading as it accumulates more information. Experiments show that ACE works for both optimizing system prompts and managing an agent's memory, outperforming other methods while also being significantly more efficient.The challenge of context engineeringAdvanced AI applications that use LLMs largely rely on "context adaptation," or context engineering, to guide their behavior. Instead of the costly process of retraining or fine-tuning the model, developers use the LLM’s in-context learning abilities to guide its behavior by modifying the input prompts with specific instructions, reasoning steps, or domain-specific knowledge. This additional information is usually obtained as the agent interacts with its environment and gathers new data and experience. The key goal of context engineering is to organize this new information in a way that improves the model’s performance and avoids confusing it. This approach is becoming a central paradigm for building capable, scalable, and self-improving AI systems.Context engineering has several advantages for enterprise applications. Contexts are interpretable for both users and developers, can be updated with new knowledge at runtime, and can be shared across different models. Context engineering also benefits from ongoing hardware and software advances, such as the growing context windows of LLMs and efficient inference techniques like prompt and context caching.There are various automated context-engineering techniques, but most of them face two key limitations. The first is a “brevity bias,” where prompt optimization methods tend to favor concise, generic instructions over comprehensive, detailed ones. This can undermine performance in complex domains. The second, more severe issue is "context collapse." When an LLM is tasked with repeatedly rewriting its entire accumulated context, it can suffer from a kind of digital amnesia.“What we call ‘context collapse’ happens when an AI tries to rewrite or compress everything it has learned into a single new version of its prompt or memory,” the researchers said in written comments to VentureBeat. “Over time, that rewriting process erases important details—like overwriting a document so many times that key notes disappear. In customer-facing systems, this could mean a support agent suddenly losing awareness of past interactions... causing erratic or inconsistent behavior.”The researchers argue that “contexts should function not as concise summaries, but as comprehensive, evolving playbooks—detailed, inclusive, and rich with domain insights.” This approach leans into the strength of modern LLMs, which can effectively distill relevance from long and detailed contexts.How Agentic Context Engineering (ACE) worksACE is a framework for comprehensive context adaptation designed for both offline tasks, like system prompt optimization, and online scenarios, such as real-time memory updates for agents. Rather than compressing information, ACE treats the context like a dynamic playbook that gathers and organizes strategies over time.The framework divides the labor across three specialized roles: a Generator, a Reflector, and a Curator. This modular design is inspired by “how humans learn—experimenting, reflecting, and consolidating—while avoiding the bottleneck of overloading a single model with all responsibilities,” according to the paper.The workflow starts with the Generator, which produces reasoning paths for input prompts, highlighting both effective strategies and common mistakes. The Reflector then analyzes these paths to extract key lessons. Finally, the Curator synthesizes these lessons into compact updates and merges them into the existing playbook.To prevent context collapse and brevity bias, ACE incorporates two key design principles. First, it uses incremental updates. The context is represented as a collection of structured, itemized bullets instead of a single block of text. This allows ACE to make granular changes and retrieve the most relevant information without rewriting the entire context.Second, ACE uses a “grow-and-refine” mechanism. As new experiences are gathered, new bullets are appended to the playbook and existing ones are updated. A de-duplication step regularly removes redundant entries, ensuring the context remains comprehensive yet relevant and compact over time.ACE in actionThe researchers evaluated ACE on two types of tasks that benefit from evolving context: agent benchmarks requiring multi-turn reasoning and tool use, and domain-specific financial analysis benchmarks demanding specialized knowledge. For high-stakes industries like finance, the benefits extend beyond pure performance. As the researchers said, the framework is “far more transparent: a compliance officer can literally read what the AI learned, since it’s stored in human-readable text rather than hidden in billions of parameters.”The results showed that ACE consistently outperformed strong baselines such as GEPA and classic in-context learning, achieving average performance gains of 10.6% on agent tasks and 8.6% on domain-specific benchmarks in both offline and online settings.Critically, ACE can build effective contexts by analyzing the feedback from its actions and environment instead of requiring manually labeled data. The researchers note that this ability is a "key ingredient for self-improving LLMs and agents." On the public AppWorld benchmark, designed to evaluate agentic systems, an agent using ACE with a smaller open-source model (DeepSeek-V3.1) matched the performance of the top-ranked, GPT-4.1-powered agent on average and surpassed it on the more difficult test set.The takeaway for businesses is significant. “This means companies don’t have to depend on massive proprietary models to stay competitive,” the research team said. “They can deploy local models, protect sensitive data, and still get top-tier results by continuously refining context instead of retraining weights.”Beyond accuracy, ACE proved to be highly efficient. It adapts to new tasks with an average 86.9% lower latency than existing methods and requires fewer steps and tokens. The researchers point out that this efficiency demonstrates that “scalable self-improvement can be achieved with both higher accuracy and lower overhead.”For enterprises concerned about inference costs, the researchers point out that the longer contexts produced by ACE do not translate to proportionally higher costs. Modern serving infrastructures are increasingly optimized for long-context workloads with techniques like KV cache reuse, compression, and offloading, which amortize the cost of handling extensive context.Ultimately, ACE points toward a future where AI systems are dynamic and continuously improving. "Today, only AI engineers can update models, but context engineering opens the door for domain experts—lawyers, analysts, doctors—to directly shape what the AI knows by editing its contextual playbook," the researchers said. This also makes governance more practical. "Selective unlearning becomes much more tractable: if a piece of information is outdated or legally sensitive, it can simply be removed or replaced in the context, without retraining the model.”

Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

A year later, the Sonos Ace is finally fulfilling its potential

2024 was an awful year for Sonos. Its <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/sonos-ace-headphones-hands-on-joining-your-home-theater-setup-with-the-push- [...]

More Copy

Match Score: 137.07

venturebeat

Writer launches AI agents that can act without prompts, taking on Amazon, M

<a href="https://writer.com/">Writer</a>, the enterprise AI agent platform backed by <a href="https://salesforceventures.com/">Salesforce Ventures</a& [...]

More Copy

Match Score: 133.41

venturebeat

Meta researchers introduce 'hyperagents' to unlock self-improving

Creating self-improving AI systems is an important step toward deploying agents in dynamic environments, especially in enterprise production environments, where tasks are not always predictab [...]

More Copy

Match Score: 120.04

venturebeat

Anthropic introduces "dreaming," a system that lets AI agents lea

<a href="https://www.anthropic.com/">Anthropic</a> on Tuesday unveiled a suite of updates to its <a href="https://platform.claude.com/docs/en/managed-agents/over [...]

More Copy

Match Score: 101.52

venturebeat

Salesforce’s Agentforce Vibes 2.0 targets a hidden failure: context overl

When startup fundraising platform VentureCrowd began deploying AI coding agents, they saw the same gains as other enterprises: they cut the front-end development cycle by 90% in some projects [...]

More Copy

Match Score: 91.80

venturebeat

GAM takes aim at “context rot”: A dual-agent memory architecture that o

For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project [...]

More Copy

Match Score: 91.62

venturebeat

Vercel breach exposes the OAuth gap most security teams cannot detect, scop

One employee at Vercel adopted an AI tool. One employee at that AI vendor got hit with an infostealer. That combination created a walk-in path to Vercel’s production environments through an [...]

More Copy

Match Score: 86.57

venturebeat

Why your LLM bill is exploding — and how semantic caching can cut it by 7

Our LLM API bill was growing 30% month-over-month. Traffic was increasing, but not that fast. When I analyzed our query logs, I found the real problem: Users ask the same questions in differe [...]

More Copy

Match Score: 83.72

venturebeat

The missing data link in enterprise AI: Why agents need streaming context,

Enterprise AI agents today face a fundamental timing problem: They can't easily act on critical business events because they aren't always aware of them in real-time.</p& [...]

More Copy

Match Score: 81.41