Roadmap¶
Reactive-Agents Framework Roadmap¶
Current Version: 0.1.0a6 (Alpha) Last Updated: January 11, 2026 Status: Active Development - Breaking changes expected Real-World Test Success Rate: 80% (⅘ tests passing)
Version Milestones¶
| Version | Status | Focus |
|---|---|---|
| 0.1.0a6 | ✅ Current | Core refactoring, provider architecture, builder pattern, streaming support |
| 0.1.0a7 | 🔄 Next | Memory & context optimization (2-3x efficiency gain), Google SDK migration |
| 0.1.0a8 | 📋 Planned | Complete strategy implementations, test coverage |
| 0.1.0a9 | 📋 Planned | Token counting, advanced reasoning patterns |
| 0.1.0b1 | 📋 Planned | Beta - Production features (caching, rate limiting) |
| 0.1.0 | 🎯 Target | Stable release - Full feature parity |
Current State (v0.1.0a6)¶
Real-World Performance (January 2026)¶
Playground Test Results: 80% success rate (⅘ tests passing)
| Agent | Strategy | Result | Iterations | Duration | Efficiency |
|---|---|---|---|---|---|
| Data Analysis | plan_execute_reflect | ✅ PASS | 6 | 92.52s | 0.17 |
| Research Assistant | plan_execute_reflect | ✅ PASS | 5 | 98.03s | 0.20 |
| Code Reviewer | reflect_decide_act | ✅ PASS | 4 | 72.29s | 0.25 |
| Task Automation | plan_execute_reflect | ✅ PASS | 6 | 89.62s | 0.17 |
| Customer Support | reflect_decide_act | ❌ FAIL* | 3 | 54.38s | - |
*False negative - agent actually succeeded but validation missed explicit keyword
Key Findings: - ✅ Strategies work correctly - All tasks completed successfully - ✅ Zero tool failures - Reliable execution - ⚠️ Low efficiency (17-25%) - Taking 2x more iterations than optimal - ⚠️ Memory not consulted - No cross-session learning - ⚠️ Tool redundancy - Same tools called multiple times
Analysis: Framework is production-ready for common use cases but leaving significant performance on the table due to dormant memory system. See Phase 1.5 for critical improvements.
Recent Improvements¶
The framework has undergone significant refactoring with these key improvements:
- Provider Architecture: Unified dual-parameter system with OpenAI-style interface
- Builder Pattern: Type-safe
ReactiveAgentBuilderwith fluent API - Component Factory: Dependency injection for all components
- Type System: Comprehensive Pydantic models (~3,900 lines)
- Event System: Type-safe EventBus with async support
- MCP Integration: First-class Model Context Protocol support
Test Coverage¶
| Category | Coverage | Status |
|---|---|---|
| Overall | 61% | Needs improvement |
| Core Engine | 87% | Good |
| Tool Manager | 87% | Good |
| Event Bus | 100% | Excellent |
| Strategies | 36-96% | Mixed |
| Providers | 14-64% | Needs work |
Provider Support¶
| Provider | Completion | Tools | Structured Output | Streaming |
|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ |
| Anthropic | ✅ | ✅ | ✅ | ✅ |
| ✅ | ⚠️ | ✅ | ✅ | |
| Groq | ✅ | ⚠️ | ⚠️ | ✅ |
| Ollama | ✅ | ⚠️ | ⚠️ | ✅ |
Phase 1: v0.1.0a7 - Critical Fixes¶
Timeline: 1-2 weeks Goal: Migrate Google SDK, improve test coverage
1.1 Fix Failing Tests ✅ COMPLETED¶
Issue: Google provider test was failing
TestGoogleModelProvider::test_get_completion
AssertionError: Expected 'get_chat_completion' to have been called once. Called 0 times.
Resolution: Fixed mock target from get_chat_completion to _get_provider_chat_completion in test file.
1.2 Google SDK Migration ✅ COMPLETED¶
Issue: Deprecated SDK warning
FutureWarning: All support for the `google.generativeai` package has ended.
Switch to the `google.genai` package.
Migration Steps:
-
Update
pyproject.toml: -
Update imports in
reactive_agents/providers/llm/google.py: -
Update API calls to match new SDK patterns (client-based architecture)
Action Items:
- Update dependencies in
pyproject.toml - Refactor
GoogleModelProviderfor new SDK - Update type hints and response handling
- Test all Google functionality (21/21 tests passing)
- Fix all diagnostic issues (0 errors, 0 warnings)
1.3 Increase Test Coverage¶
Priority Targets:
| Component | Current | Target | Priority |
|---|---|---|---|
task_classifier.py |
13% | 70% | High |
prompts/base.py |
34% | 60% | Medium |
strategies/plan_execute_reflect.py |
36% | 70% | High |
strategies/reflect_decide_act.py |
43% | 70% | High |
providers/llm/groq.py |
14% | 60% | Medium |
providers/llm/anthropic.py |
40% | 70% | High |
Action Items:
- Write unit tests for
TaskClassifier.classify_task() - Write unit tests for fallback classification
- Add integration tests for strategy selection
- Add provider-specific test cases
Phase 1 Deliverables¶
- All tests passing (424/429 - 5 pre-existing failures unrelated to SDK)
- Google SDK migrated to
google.genai - Critical component coverage > 60%
- No deprecation warnings
Phase 1.5: v0.1.0a7 - Memory & Context Optimization (NEW - HIGH PRIORITY)¶
Timeline: 1-2 weeks Goal: Unlock dormant memory system and optimize context management Impact: 2-3x efficiency improvement in agent performance Discovered: January 11, 2026 from real-world playground testing
Critical Finding: Memory System is Dormant 🔴¶
Real-world test analysis revealed: Memory management exists and stores data perfectly, but is never consulted during agent execution. This causes:
- 6 iterations instead of 3-4 for common tasks (efficiency: 17% vs target 40%+)
- Tool redundancy (Code Reviewer called
check_securitytwice) - No learning curve across sessions
- Repeated mistakes
Test Results:
✅ Data Analysis Agent: 6 iterations, efficiency 0.17 (should be 3-4 iterations, 0.40+)
✅ Task Automation: 6 iterations, efficiency 0.17 (should be 3-4 iterations, 0.40+)
✅ Code Reviewer: 4 iterations, ran same tool twice
1.5.1 Memory-Guided Execution (HIGHEST IMPACT)¶
Problem: Memory exists but isn't used during reasoning
Current State (in memory_manager.py):
- ✅ save_memory() - Works perfectly
- ✅ update_session_history() - Works perfectly
- ✅ update_tool_preferences() - Works perfectly
- ❌ get_similar_sessions(task) - DOESN'T EXIST
- ❌ get_relevant_reflections(context) - DOESN'T EXIST
- ❌ recommend_tools_for_task(task) - DOESN'T EXIST
Action Items:
- Add
get_similar_sessions()tomemory_manager.py - Use text similarity to find past tasks
- Return strategy used, tools, iterations, success rate
-
Priority: CRITICAL
-
Add
get_relevant_reflections()tomemory_manager.py - Filter reflections by context relevance
- Return learnings from similar situations
-
Priority: HIGH
-
Add
recommend_tools_for_task()tomemory_manager.py - Analyze tool preferences for similar tasks
- Return high-success-rate tools
-
Priority: HIGH
-
Integrate memory loading in
engine.py - Call memory query before task execution
- Surface past learnings in prompts
-
Priority: CRITICAL
-
Update all strategy
initialize()methods - Load relevant memory before starting
- Use past insights to inform decisions
- Priority: HIGH
Expected Impact: - Iterations: 6 → 3-4 (33-50% reduction) - Efficiency: 17% → 35-40% (2x improvement) - Tool redundancy: Eliminated - Learning curve: Agents improve over time
Files to Modify:
- reactive_agents/core/memory/memory_manager.py - Add query methods
- reactive_agents/core/reasoning/engine.py - Integrate memory consultation
- reactive_agents/core/reasoning/strategies/*.py - Use memory in initialization
1.5.2 LLM-Powered Context Summarization (HIGH IMPACT)¶
Problem: Line 544 of context_manager.py has naive placeholder implementation
Current Implementation:
def _generate_summary(self, messages, start_idx, end_idx) -> str:
# Naive: just counts messages by role
summary = f"[Summary of {len(messages)} messages: {role_counts}]"
# TODO: Implement more sophisticated summarization using LLM
return summary
This is listed as Technical Debt TD-004 but is more critical than realized
Action Items:
- Implement LLM-powered summarization in
_generate_summary()async def _generate_summary(self, messages, start_idx, end_idx) -> str: """Generate semantic summary using agent's LLM.""" message_text = "\n".join([f"{m['role']}: {m['content'][:200]}" for m in messages]) prompt = f"""Summarize this conversation segment (2-3 sentences): {message_text} Focus on: key decisions, important results, actionable insights.""" result = await self.agent_context.model_provider.complete( prompt=prompt, max_tokens=150 ) return f"[Context Summary {start_idx}-{end_idx}]: {result.content}" - Priority: CRITICAL
Expected Impact: - Context efficiency: +40-50% - Token costs: -20-30% - Information retention during long conversations: Much better - Better decision quality with relevant historical context
Files to Modify:
- reactive_agents/core/context/context_manager.py:524
1.5.3 Tool Redundancy Detection (MEDIUM-HIGH IMPACT)¶
Problem: Agents call the same tool multiple times unnecessarily
Evidence: Code Reviewer called check_security twice in 4 iterations
Action Items:
- Add
RecentToolTrackertotool_manager.pyclass RecentToolTracker: def __init__(self, window=5): self.recent_calls = [] # Last N tool calls def is_recent_duplicate(self, tool_call: Dict) -> bool: """Check if this exact tool call happened recently.""" signature = self._hash_call(tool_call) return signature in [self._hash_call(c) for c in self.recent_calls[-3:]] def _hash_call(self, call: Dict) -> str: """Create signature: tool_name:params""" return f"{call['name']}:{json.dumps(call.get('parameters', {}))}" -
Priority: MEDIUM-HIGH
-
Integrate tracker into tool execution flow
- Log warning when duplicate detected
- Optionally skip duplicate calls
- Priority: MEDIUM
Expected Impact: - Tool redundancy: Eliminated - Iterations: -10-15% - Better iteration efficiency
Files to Modify:
- reactive_agents/core/tools/tool_manager.py
1.5.4 Completion Prediction (MEDIUM IMPACT)¶
Problem: Agents don't know when they're close to completion
Action Items:
- Add completion score estimation to
engine.pyasync def predict_completion(self, task: str, progress: Dict) -> float: """Estimate how close we are to completion (0.0-1.0).""" prompt = f"""Estimate task completion: Task: {task} Iterations: {progress['iterations']} Tools Used: {progress['tools']} Return score 0.0-1.0 (0=just started, 1.0=complete):""" result = await self.think(prompt) return float(result.content) - Priority: MEDIUM
Expected Impact: - Earlier completion detection - Fewer unnecessary validation iterations - Better resource utilization
Files to Modify:
- reactive_agents/core/reasoning/engine.py
Phase 1.5 Deliverables¶
- Memory consultation integrated - Agents load similar sessions before execution
- LLM-powered context summarization - Semantic summaries replace naive placeholders
- Tool redundancy detection - No repeated tool calls
- Completion prediction - Agents estimate progress
- Efficiency improvement - Average efficiency from 17% to 35-40%
- Iteration reduction - Common tasks: 6 iterations → 3-4
Success Metrics: - Playground test efficiency: 17% → 35%+ (2x improvement) - Average iterations for known tasks: -40-50% - Tool redundancy incidents: 0 - Cross-session learning: Measurable improvement on repeated task types
Phase 2: Streaming Support ✅ COMPLETED (v0.1.0a6)¶
Status: ✅ Completed Goal: Add streaming across all providers
2.1 Streaming Architecture ✅¶
Implemented in reactive_agents/core/types/provider_types.py and reactive_agents/providers/llm/base.py:
class StreamChunk(BaseModel):
"""Single chunk in streaming response."""
content: str = ""
role: Optional[str] = None
finish_reason: Optional[str] = None
tool_calls: Optional[List[Dict[str, Any]]] = None
is_final: bool = False
prompt_tokens: int = 0
completion_tokens: int = 0
total_tokens: int = 0
chunk_index: int = 0
model: Optional[str] = None
class BaseModelProvider:
async def stream_chat_completion(
self,
messages: List[Dict[str, Any]],
tools: Optional[List[Dict[str, Any]]] = None,
options: Optional[Dict[str, Any]] = None,
**kwargs,
) -> AsyncIterator[StreamChunk]:
"""Stream chat completion tokens."""
...
2.2 Provider Implementations ✅¶
All providers implemented with _stream_provider_chat_completion():
| Provider | Status | Notes |
|---|---|---|
| OpenAI | ✅ | Native streaming with stream_options |
| Anthropic | ✅ | Event-based streaming with messages.stream() |
| ✅ | generate_content() with stream=True |
|
| Groq | ✅ | OpenAI-compatible streaming |
| Ollama | ✅ | Native async streaming |
2.3 Remaining Integration¶
Pending for future versions:
- Add streaming event types
- Integrate with
ExecutionEngine - Add
stream_run()toReactiveAgent - Add streaming example (see
docs/examples/streaming.md)
Phase 2 Deliverables ✅¶
- Streaming works in all 5 providers
-
StreamChunkmodel defined - Token usage tracking in final chunks
- Tool call support during streaming
Phase 3: v0.1.0a9 - Complete Strategies¶
Timeline: 2-3 weeks Goal: Complete all reasoning strategy implementations
3.1 PlanExecuteReflect Strategy¶
Current Coverage: 36%
Missing Components:
- Plan generation with validation
- Step-by-step execution tracking
- Reflection after each step
- Plan revision based on outcomes
Files:
reactive_agents/core/reasoning/strategies/plan_execute_reflect.pyreactive_agents/core/reasoning/steps/plan_execute_reflect_steps.py
Action Items:
- Implement
PlanStep.execute()with proper LLM prompting - Implement
ExecutionStep.execute()with tool integration - Implement
ReflectionStep.execute()with memory storage - Add plan validation and scoring
- Add plan revision capability
- Write comprehensive tests (target: 80%)
3.2 ReflectDecideAct Strategy¶
Current Coverage: 43%
Missing Components:
- Proper reflection generation
- Decision making based on reflection
- Action selection algorithm
- Learning from outcomes
Files:
reactive_agents/core/reasoning/strategies/reflect_decide_act.pyreactive_agents/core/reasoning/steps/reflect_decide_act_steps.py
Action Items:
- Implement
ReflectStep.execute() - Implement
DecideStep.execute()with scoring - Implement
ActStep.execute()with tool selection - Add outcome evaluation
- Write comprehensive tests (target: 80%)
3.3 Token Counting¶
Add to all providers:
class BaseModelProvider:
def count_tokens(self, text: str) -> int:
"""Count tokens using provider's tokenizer."""
raise NotImplementedError
def get_context_window(self) -> int:
"""Get model's context window size."""
raise NotImplementedError
Action Items:
- Add
count_tokens()to OpenAI (tiktoken) - Add
count_tokens()to Anthropic (anthropic-tokenizer) - Add
count_tokens()to Google - Add
count_tokens()to Groq - Add
count_tokens()to Ollama - Add token tracking to
CompletionResponse
Phase 3 Deliverables¶
- PlanExecuteReflect coverage > 80%
- ReflectDecideAct coverage > 80%
- Token counting in all providers
- Overall test coverage > 75%
Phase 4: v0.1.0b1 - Production Features (Beta)¶
Timeline: 3-4 weeks Goal: Add production-grade features
4.1 Caching System¶
Components:
- LLM response cache (exact match)
- Semantic cache (similar queries)
- Tool result cache
- Pluggable backends (memory, Redis, SQLite)
class CacheConfig:
enabled: bool = True
backend: Literal["memory", "redis", "sqlite"] = "memory"
ttl_seconds: int = 3600
semantic_threshold: float = 0.95
4.2 Rate Limiting¶
Features:
- Per-provider rate limits
- Token bucket algorithm
- Automatic retry with backoff
- Request queuing
class RateLimitConfig:
requests_per_minute: int = 60
tokens_per_minute: int = 100000
concurrent_requests: int = 10
4.3 Model Fallback¶
Features:
- Automatic failover on errors
- Health-based provider ordering
- Configurable fallback chain
agent = await (
ReactiveAgentBuilder()
.with_provider(Provider.OPENAI, "gpt-4")
.with_fallback_providers([
(Provider.ANTHROPIC, "claude-3-sonnet"),
(Provider.GROQ, "llama-3.1-70b"),
])
.build()
)
4.4 Observability¶
Features:
- OpenTelemetry tracing
- Prometheus metrics
- Structured logging with correlation IDs
- Grafana dashboard template
Phase 4 Deliverables¶
- LLM response caching
- Semantic caching
- Per-provider rate limiting
- Model fallback system
- OpenTelemetry integration
- Prometheus metrics
- Grafana dashboard template
Phase 5: v0.1.0 - Stable Release¶
Timeline: 2-3 weeks Goal: Polish and stabilize for production use
5.1 Advanced Multi-Agent¶
- Hierarchical agent orchestration
- Agent pools with load balancing
- Shared memory between agents
- Enhanced A2A protocol
5.2 Tool Enhancements¶
- Tool chaining/pipelines
- Tool dependency resolution
- Parallel tool execution improvements
5.3 Vision/Multimodal¶
- Image input support (OpenAI, Anthropic, Google)
- Multimodal tool results
5.4 Documentation & Polish¶
- Complete API reference
- Tutorial series
- Best practices guide
- Performance benchmarks
Technical Debt¶
Critical Priority (NEW - January 2026)¶
| ID | Description | Location | Effort | Status |
|---|---|---|---|---|
| TD-008 | Memory queries not implemented | memory_manager.py |
Medium | 🔴 CRITICAL |
| TD-009 | Memory not consulted during execution | engine.py, strategies |
Medium | 🔴 CRITICAL |
| TD-010 | Tool redundancy not detected | tool_manager.py |
Small | 🟡 HIGH |
High Priority¶
| ID | Description | Location | Effort | Status |
|---|---|---|---|---|
| TD-004 | Context summarization TODO (now CRITICAL) | context_manager.py:524 |
Medium | 🔴 CRITICAL |
providers/llm/google.py |
✅ Completed | |||
| TD-002 | Incomplete strategies | core/reasoning/strategies/ |
Large | ⏳ Pending |
| ✅ Completed |
Medium Priority¶
| ID | Description | Location | Effort |
|---|---|---|---|
| TD-005 | Plugin system TODOs | plugins/plugin_manager.py |
Medium |
| TD-006 | Low provider coverage | Multiple providers | Medium |
| TD-007 | Circular import workarounds | Various | Small |
Success Metrics¶
v0.1.0a6 (Current) ✅¶
- 0 failing tests (429/429 passing)
- Streaming in 5/5 providers
- StreamChunk model with token tracking
v0.1.0a7¶
- 0 deprecation warnings (Google SDK migrated)
- Task classifier coverage > 60%
- Provider coverage improvement
v0.1.0a8¶
- Strategy coverage > 80%
- Overall coverage > 75%
v0.1.0b1¶
- < 100ms cache hit latency
- 0 rate limit errors in normal operation
- Full trace visibility
v0.1.0¶
- Production deployments
- Complete documentation
- Benchmark results published
Contributing¶
See CONTRIBUTING.md for guidelines.
Priority Areas (Updated January 2026)¶
- Memory system activation 🔴 CRITICAL - TD-008, TD-009
- Context summarization 🔴 CRITICAL - TD-004
- Tool redundancy detection 🟡 HIGH - TD-010
Google SDK migration✅ Completed - TD-001- Strategy completeness - TD-002
- Test coverage
- Documentation
Streaming implementation✅ Completed
This roadmap is a living document updated as the project evolves.