Reduce BMAD Agent Context: 67% To 25%
Hey guys! Let's dive into an exciting idea to seriously optimize our BMAD agents' context window usage. Currently, these agents hog a massive chunk of Claude's context, and we're going to fix that. This proposal outlines a plan to reduce the context window consumption of BMAD agents from a whopping 67% to just 25%. That's a huge improvement, and it's going to make a real difference in how our agents perform, especially on complex tasks.
PASS Framework
Problem
The main issue is that BMAD agents, including TEA, SM, and DEV, are consuming way too much of Claude's 200K context window during activation. We're talking about 67% or more! This leaves a measly 30% for actual work. The TEA agent's knowledge base alone can gobble up 86% of the context (571 KB / 143K tokens). This leads to context overflow and, ultimately, degraded performance on complex tasks. Think of it like trying to run a marathon with a giant backpack full of bricks – not fun, right?
The existing high context consumption leads to frequent context overflows, especially when the agents need to access large knowledge bases or handle complex tasks. This means that the agents often fail to complete their tasks efficiently, leading to frustration and wasted resources. The problem is particularly acute with the TEA agent, which relies heavily on its knowledge base to perform its functions. When the knowledge base consumes a large portion of the context window, it leaves very little room for the agent to process new information or execute complex workflows.
Furthermore, the current context usage makes it difficult to maintain conversation history and track the progress of ongoing tasks. When context overflow occurs, users are often forced to start new sessions mid-task, losing valuable conversation history and the context of their previous interactions. This not only disrupts the workflow but also makes it harder to collaborate and share information effectively. In addition, the high context consumption limits the scalability of the BMAD agents, making it challenging to deploy them in environments with limited resources or high demands for context space.
Audience
This impacts all developers using BMAD. The impact is severe: complex tasks fail because of context overflow, forcing users to start new sessions mid-task and losing conversation history. Imagine being in the middle of coding something and suddenly losing all your progress – that's the kind of pain we're trying to avoid here!
Solution
Our solution is a 4-tier context optimization plan:
- (A) Disable unused MCP servers: This is already done! ✅ DONE
- (B) Lazy-load TEA knowledge base on-demand only: This means the knowledge base only loads when it's actually needed, not right away.
- (C) Split
project-context.mdinto domain-specific modules: Instead of one giant file, we'll have smaller, more focused files. - (D) Create compressed "slim" versions of knowledge files: Smaller files mean less context used.
Acceptance Criteria:
- [ ] AC-1: TEA agent activates with <10% context usage (currently 86% with full KB)
- [ ] AC-2: Knowledge loads only when explicitly requested or workflow requires it
- [ ] AC-3: DEV/SM agents load only relevant context modules (backend OR frontend, not both)
- [ ] AC-4: Typical workflow session uses <30% context (currently 67%)
- [ ] AC-5: Context overflow errors eliminated (currently occurs with TEA + full knowledge)
The main goal here is to reduce the context usage of the BMAD agents while ensuring that they can still perform their functions effectively. By implementing these optimizations, we aim to create a more efficient and scalable system that can handle complex tasks without running into context overflow issues. This will not only improve the user experience but also make it easier to maintain and extend the BMAD agents in the future.
Size
We estimate this will take about 3-5 days (M) to complete.
Metadata
Submitted by: Tungmv Date: 2026-01-16 Priority: [Leave blank - will be assigned during team review]
Technical Details
Root Cause Analysis
Here's a breakdown of where the context is being used:
| Component | Tokens | % of Context |
|---|---|---|
| Claude Code System Prompt + MCP Tools | ~35,000 | 17.5% |
| TEA (AT) + ATDD workflow (minimal) | ~16,750 | 8% |
| TEA (AT) + ATDD workflow (full knowledge) | ~172,750 | 86% |
| SM (CS) + Create-Story workflow | ~22,130 | 11% |
| DEV (DS) + Dev-Story workflow | ~19,830 | 10% |
As you can see, the TEA agent with full knowledge is the biggest culprit. Reducing its context usage will have the most significant impact.
Implementation Plan
Here's the plan to tackle this issue:
| Phase | Task | Effort | Token Savings |
|---|---|---|---|
| A | Disable unused MCP servers | ✅ Done | 6,800 (3.4%) |
| B | Lazy-load TEA knowledge base | Low | 50K-140K (25-70%) |
| C | Split project-context.md |
Medium | 4,500 (2.5%) |
| D | Create slim knowledge files | Medium | 51,000 (25%) |
By implementing these phases, we expect to see a significant reduction in context usage. Phase B, lazy-loading the TEA knowledge base, is expected to have the biggest impact, potentially saving us 50K-140K tokens.
The lazy-loading implementation will require careful consideration of how the knowledge base is accessed and utilized by the TEA agent. We need to ensure that the knowledge is loaded efficiently and only when it is needed, without introducing any performance bottlenecks or disrupting the agent's workflow. This may involve modifying the agent's code to include logic for checking the availability of the required knowledge and triggering the loading process when necessary.
Splitting the project-context.md file into domain-specific modules will involve analyzing the contents of the file and identifying logical groupings of information that can be separated into distinct modules. This will require a good understanding of the project's structure and the relationships between different components. The modules should be designed in a way that allows the DEV and SM agents to easily load only the relevant context for their respective domains, without having to load the entire file.
Creating slim knowledge files will involve identifying and removing any unnecessary or redundant information from the knowledge base. This may include removing outdated or irrelevant content, consolidating similar concepts, and optimizing the formatting of the files to reduce their size. The goal is to create a leaner and more efficient knowledge base that can be loaded and processed more quickly, while still providing the agents with the information they need to perform their functions effectively.
Expected Outcome
Here's what we expect to see after implementing these changes:
| Metric | Before | After | Improvement |
|---|---|---|---|
| System Overhead | 18.6% | 15.2% | -3.4% |
| TEA + ATDD (typical) | 67% | 25% | -42% |
| Available for work | 33% | 75% | +42% |
With these improvements, we'll have much more context available for actual work, leading to better performance and fewer context overflow errors. Reducing the system overhead will free up more resources for the agents to perform their functions efficiently. The significant reduction in TEA + ATDD context usage will make it possible to handle more complex tasks without running into context overflow issues. And the increased availability of context for work will allow the agents to process more information and execute more sophisticated workflows.