Mimir - Development Roadmap

Platform-agnostic, BYOK AI coding agent CLI. TypeScript, test-driven, cross-platform.

Priority: Configuration/Teams → Tools → Agent Orchestration → Core Features → Polish

Phase 1: Foundation & Infrastructure

Goal: Core project structure, CI/CD, platform abstractions, infrastructure

Project Setup

Initialize TypeScript project with yarn
Configure tsconfig.json (strict mode)
Set up ESLint + Prettier
Configure Vitest for testing
Set up project directory structure
Create .gitignore with .mimir/ entries
Initialize Git repository

Core Infrastructure

Logging: Winston/Pino, log rotation, .mimir/logs/, context-aware logging
Error Handling: Custom error classes, global handler, Sentry integration
Monitoring: Performance hooks, metrics collection, health checks
Security: npm audit, Snyk, input validation, secrets management, rate limiting
Database: SQLite schema, migrations, connection pooling, backup strategy
Configuration: Zod validation, env configs, secrets encryption, migration system
Caching: In-memory cache for tokens/files, invalidation, size limits
Build: tsup bundling, binary compilation, multi-platform builds
Development: VSCode settings, debug configs, git hooks (pre-commit, pre-push)

CI/CD Pipeline

GitHub Actions: test.yml, build.yml, release.yml
Code coverage (Codecov)
Automated linting and type checking

Installation Scripts

install.ps1 for Windows (PowerShell)
install.sh for Unix (bash/zsh)
Test on Windows 10/11, macOS, Linux (Ubuntu, Debian, Fedora) - via GitHub Actions test-installation.yml

Platform Abstraction Layer

IFileSystem interface + cross-platform implementation (fs/promises + globby)
IProcessExecutor interface + cross-platform implementation (execa)
IDockerClient interface skeleton
Path utilities (normalize, resolve, join)
Unit tests for all abstractions

Phase 2: Configuration System & Teams/Enterprise Support

Goal: Robust configuration with Teams integration, storage abstraction, policy enforcement

See: docs/contributing/plan-enterprise-teams.md for detailed architecture

Configuration Schema

Define Zod schemas for all config types
Generate TypeScript types from schemas
Default configuration values
Configuration documentation

Configuration Loader (Enhanced)

Storage Abstraction

IStorageBackend interface
LocalSQLiteStorage implementation (existing DB code)
TeamsCloudStorage implementation (API-based)
HybridStorage implementation (local-first with background sync)
Conversation storage (save/load/list/delete)
Message storage (append/load)
Tool call recording
Permission/audit logging
Config caching (for offline Teams mode)

Teams API Client

Authentication System

Sync Manager

Background batch sync (configurable interval)
Sync queue (conversation, audit, tools, commands)
Conflict resolution strategies
Offline mode handling
Force sync command (mimir teams sync)
Sync status tracking

Policy Enforcement

Enforce Teams config (cannot be overridden locally)
Allowed models enforcement
Allowed sub-agents enforcement
Forced sub-agents (e.g., security agent)
Model selection per sub-agent enforcement
Docker sandbox mode enforcement (local/cloud/auto)
Budget limits enforcement (daily/weekly/monthly)

Permission & Security Configuration

Command Allowlist
- Glob patterns and regex matching
- Default allowlist (safe commands)
- Custom allowlist in config
- Team-shared allowlist templates (.mimir/allowlist.yml)
Auto-Accept Configuration
- autoAccept: true/false/ask
- alwaysAcceptCommands list
- Command-specific auto-accept rules
Risk Assessment Levels
- Risk levels: low, medium, high, critical
- Command classification by risk
- acceptRiskLevel config setting
- Auto-block above configured risk level
NEW: Teams shared allowlist integration

Keyboard Shortcuts Configuration

Define default keyboard shortcuts
Allow customization in config (keyBindings section)
Support for Ctrl, Alt, Shift combinations
Platform-specific defaults (Cmd on macOS, Ctrl on Windows/Linux)
Configurable shortcuts for:
- Interrupt/cancel (default: Ctrl+C)
- Mode switching (default: Shift+Tab)
- Accept command (default: Enter)
- Quick reject (default: Escape)
- Alternative instruction (default: Ctrl+E)
- Help overlay (default: ?)
Load custom shortcuts from CLAUDE.md (KeyBindingsManager)

Configuration Storage

Create ~/.mimir/ on first run
Generate default config.yml template
Create .mimir/ on mimir init
Auto-add .mimir/ to .gitignore
Example configurations and templates
Example custom commands (test-coverage.md, commit.md, doctor.md)

Testing

Phase 3: LLM Provider Abstraction

Goal: Provider-agnostic LLM integration (7+ providers + local models)

Base Provider Architecture

ILLMProvider interface (chat, streamChat, countTokens, calculateCost)
BaseLLMProvider abstract class
Common HTTP request logic (APIClient)
Retry logic with exponential backoff
Error handling for API failures

Provider Implementations

DeepSeek: API integration, tiktoken, model selection, cost calc, streaming (OpenAI-compatible)
Anthropic: API integration, tiktoken approximation, model selection, cost calc, streaming
OpenAI: Coming later (similar to DeepSeek, OpenAI-compatible)
Google/Gemini: Coming later (requires Gemini SDK)
Qwen: Coming later (OpenAI-compatible, similar to DeepSeek)
Ollama (Local): Coming later (local API, no cost)

Provider Factory

ProviderFactory.create()
API key loading from config/env
Graceful handling of missing keys
Provider-specific configurations
NEW: Proxied provider for Teams mode (route through Teams API)

Shared Utilities

Pricing data (hybrid: API + static fallback, 24h cache)
Tool formatters (OpenAI ↔ Anthropic format conversion)
Stream parsers (SSE for OpenAI/Anthropic formats)
API client wrapper (axios-based with error mapping)

Testing

Mock HTTP requests (MSW)
Test DeepSeek provider (chat, streaming, tools, errors)
Test Anthropic provider (chat, streaming, tools, errors)
Test error scenarios (rate limits, network)
Test token counting accuracy
Test cost calculations
Test proxied provider (Teams mode)

Phase 4: Tool System

Goal: Comprehensive tool system with built-in tools, custom tools, MCP, and Teams integration

See: docs/contributing/plan-tools.md for detailed architecture

Tool Architecture

Tool interface (name, description, schema, enabled, tokenCost, execute)
ToolContext interface (platform, config, conversation, logger, llm, permissions)
ToolRegistry class
Tool discovery system
Tool execution wrapper
Tool result formatting
Token cost estimation
Enable/disable tools in config

Built-in Tools

FileOperationsTool
- Read, write (with backup), edit (find/replace, line-based)
- List directory, create directories
- Delete (with confirmation)
- Check existence, get metadata
FileSearchTool
- grep/ripgrep integration
- Glob pattern matching
- Regex search
- Include/exclude patterns
- Formatted results with line numbers
BashExecutionTool
- Execute commands in project directory
- Capture stdout/stderr
- Timeout handling
- Failure handling
- Windows (PowerShell) and Unix (bash/zsh) support
- Command allowlist/blocklist
- Permission prompt system
GitTool
- git status, diff, log, branch, commit, checkout
- Detect git repository
- Parse git output

Tool Configuration

Tool enable/disable in config.yml
Per-tool settings (timeout, permissions, etc.)
Token cost tracking
System prompt size calculation
Teams-enforced tools (cannot be disabled)

/tools Command (In-Chat Management)

/tools - list all tools with status and token costs
/tools enable <name> - enable a tool
/tools disable <name> - disable a tool (if not enforced)
/tools info <name> - show tool details (description, schema, cost)
/tools tokens - show token cost breakdown (visual bar chart)
Update config.yml when enabling/disabling tools
Display total system prompt token cost

Custom Tools (TypeScript Runtime)

YAML tool definition loader (.mimir/tools/*.yml)
JSON Schema to Zod conversion
TypeScript code compilation (esbuild)
Docker sandbox execution (isolated context)
Context injection (platform, config, conversation, logger, llm)
Permission system integration (inherit allowlist)
Tool-specific permissions (allowlist, autoAccept, riskLevel)
Error handling and timeout enforcement
Example custom tools (run_tests, analyze_dependencies)

Sandbox Runtime for Custom Tools

Docker image for tool execution (mimir/tool-sandbox:node18)
Sandbox runtime library (safe platform abstractions)
IPC/API for host communication (file ops, command execution)
Resource limits (CPU, memory, timeout)
Security: prevent escaping working directory

Model Context Protocol (MCP) Support

MCP Client: stdio/HTTP transports, server lifecycle, tool parsing
MCP Server Management: Discovery, auto-start, health checks, failure handling
MCP Configuration: Config schema, server definitions (command, args, env)
MCP Tool Registry: Dynamic registration, namespacing (server/tool), conflict handling
Built-in MCP Servers: Filesystem server, Git server (optional dependencies)
MCP Tool Adapter: Wrap MCP tools as Tool interface

Teams Tool Integration

Load tools from Teams API (GET /orgs/{orgId}/tools)
Teams tools override local tools (if conflict)
Teams tools are enforced (cannot be disabled)
Display Teams tools in /tools command
Execute Teams tools (may route to cloud sandbox)

Syntax Highlighting

Integrate highlighter (Shiki/highlight.js)
Support major languages (TS, JS, Python, Go, Rust, .NET, etc.)
Apply to file content and code blocks

Testing

Phase 5: Docker Sandbox

Goal: Secure, isolated code execution

Docker Client

Complete DockerClient class (dockerode)
Detect Docker installation
Handle Windows Docker Desktop and Unix daemon
Connection error handling

Sandbox Images

Dockerfile.base (Alpine/Ubuntu)
Dockerfile.node (Node.js)
Dockerfile.python (Python)
Dockerfile.tool-sandbox (for custom tools)
Multi-arch builds (amd64, arm64)

Container Management

Build custom images
Run containers with commands
Resource limits (CPU, memory, timeout)
Mount project directory (read-only option)
Capture output (stdout/stderr)
Cleanup and timeout handling
Result caching

Code Execution Tool

Cloud Sandbox Integration

Detect Teams mode and dockerMode config
Route to cloud sandbox when configured
Execute via Teams API (POST /sandbox/execute)
Fallback to local on failure (if allowed)

Testing

testcontainers for integration tests
Container creation/cleanup tests
Resource limits enforcement
Timeout handling
Multi-platform Docker support
Cloud sandbox routing tests

Phase 6: ReAct Agent Loop

Goal: Core agent reasoning, action loop, interrupt handling

Agent Architecture

Interrupt & Control System

Cancel/Interrupt
- Graceful SIGINT handling (Ctrl+C)
- Save agent state before interruption
- Resume from interruption point
- Show partial results on cancel
- Resource cleanup (containers, temp files)
Mode Switching During Execution
- Pause agent, show mode menu
- Switch between plan/act/discuss modes
- Preserve context when switching
- Resume in new mode
Alternative Instructions
- On permission prompt, allow typing alternative
- “edit” option instead of just “always accept”
- Parse alternative instruction
- Update agent plan
- Show updated plan before proceeding

Reasoning

Format messages for LLM (system prompt, history, tools)
Parse LLM response for actions
Handle tool calling format
Handle “finish” action
Handle malformed responses

Acting

Execute tool based on LLM action
Pass arguments to tool
Handle tool errors
Format tool results
Permission & Risk Assessment
- Assess command risk before execution
- Check against allowed commands (local + Teams)
- Prompt user if not auto-accepted
- Block high-risk if configured
- Log all permission decisions

Observing

Store tool results in history
Update agent state
Log actions and observations
Track token usage per iteration

Error Handling

Testing

Phase 7: Conversation History & Memory

Goal: Persistent conversation storage with Teams sync

Storage

SQLite schema (conversations, messages, tool_calls, permissions)
Database initialization
Conversation CRUD operations
Message append operations
Permission decision audit trail
NEW: Use IStorageBackend abstraction (supports local + cloud)

Memory Management

History Management (CLI commands)

mimir history list - list recent conversations
mimir history resume <id> - continue conversation
mimir history export <id> - export to file
mimir history clear - delete conversation history

Teams Sync Integration

Background sync to Teams API (if enabled)
Load conversations from Teams API
Conflict resolution (cloud vs local)

Testing

Phase 8: Token Counting & Cost Analysis

Goal: Real-time token and cost tracking with Teams quota enforcement

Token Counting

Cost Calculation

Pricing data structure (per provider/model)
Load pricing from config
Calculate cost per message
Calculate cumulative cost per session
Store costs in database

Real-Time Display

Show token count after each message
Show cost after each message
Show session total (tokens + cost)
Color-code warnings (80%, 90% of budget)

Budget Management

Teams Quota Enforcement

Check quota via Teams API before expensive operations
Display org-level quota usage
Enforce daily/weekly/monthly limits
Show quota in mimir teams status

Cost Analytics (CLI commands)

mimir cost today - today’s spending
mimir cost week - weekly spending
mimir cost month - monthly spending
mimir cost compare - compare providers
mimir cost export - export to CSV
In-chat display: show cost after each message

Cost Comparison Dashboard

Comparison table (DeepSeek vs others)
Calculate savings
Show historical trends
Recommend cheaper alternatives

Testing

Phase 9: CLI & Terminal UI

Goal: Polished terminal interface with modes and shortcuts

Command Structure

CLI Commands (repo/session management):

Slash Commands (in-chat, context-specific):

Interactive Chat UI (Ink)

Display user/assistant messages (streaming)
Display tool calls with spinners
Display tool results (formatted)
Show token/cost info
Status indicators (thinking, executing, etc.)
User input with autocomplete
Slash command support
Permission Prompts
- Display command and risk level (color-coded)
- Options: y/n/a(always)/never/edit/view
- On “edit”: show command, allow alternative, replan, show updated plan
- “Always accepted” indicator for auto-approved
- Show Teams allowlist status
Keyboard Shortcuts
- Implement configured shortcuts
- Help overlay (show shortcuts)
- Customizable per user config

Plan, Act, Discuss Modes

Plan Mode: Create task breakdown, get approval, allow editing
Act Mode: Execute autonomously, show progress, checkpoints, allow interruption
Architect/Discuss Mode: Interactive planning
- Agent asks clarifying questions
- Multi-turn Q&A
- Present approaches with pros/cons
- Discuss trade-offs
- Let user guide decisions
- Generate architecture plan
- Questions: scale, preferences, performance vs maintainability, existing patterns
- Switch to Act mode after approval

Mode Switching

Smooth transitions between modes
Mode indicator in UI
Commands: /mode plan, /mode act, /mode discuss
Keyboard shortcut (configurable, default Shift+Tab)
Preserve agent state
Cancel current operation with confirmation

Task Display

Todo list (checkboxes)
Progress bars
Tree view for nested tasks
Status updates (pending, in-progress, done, failed)

Syntax Highlighting

Code in chat messages
File diffs
Command output

Logging

Structured logger (Winston/Pino)
Log levels (debug, info, warn, error)
Write to .mimir/logs/
--verbose flag for debug
--quiet flag for minimal output

Testing

Phase 10: Agent Orchestration & Multi-Agent System

Goal: Task decomposition, specialized agents, parallel execution

See: docs/contributing/plan-agent-orchestration.md for detailed architecture

Agent Orchestrator Core

AgentOrchestrator interface and implementation
Task complexity detection (single vs multi-agent)
Task decomposition (LLM-based parallel task planning)
Dependency graph construction
Topological sort (execution order)
Parallel execution engine
Result aggregation

Sub-Agent Management

Sub-agent creation (createSubAgent)
Nested agent creation (createNestedAgent)
Agent lifecycle management (start, pause, resume, stop)
Agent status tracking (pending, running, waiting, completed, failed, interrupted)
Budget enforcement per agent (tokens, cost, duration)
Nesting depth limits (configurable, default: 2 levels)

Specialized Agent Roles

Main Agent: Orchestrator, full tool access
Finder Agent: Quick searches, file discovery (Haiku/Qwen, read-only tools)
Oracle Agent: Deep reasoning, complex bugs (o3/GPT-5, full tools)
Librarian Agent: API/docs research (Sonnet 4.5, read-only)
Refactoring Agent: Code refactoring (Sonnet 4.5, write tools)
Reviewer Agent: Code review, security (Sonnet 4.5/o3, read + git)
Tester Agent: Test generation (Sonnet 4.5, write + bash)
Rush Agent: Quick targeted loops (Haiku, limited tools, 3-5 iterations)

Role-Based Tool Restrictions

Map agent roles to allowed tools
Enforce tool restrictions per agent
Override with Teams configuration (if enforced)

Interactive Agent Plan UI

Display parallel task plan to user
Show recommended role, model, estimated cost per task
Allow user to edit task descriptions
Allow user to change models (if not enforced)
Options: approve, cancel, edit, manual configuration
Configurable auto-approval mode

Multi-Agent Execution UI

Display all agents in one pane (stacked vertically)
Show status icon per agent (○ pending, ● running, ✓ completed, ✗ failed, ◌ waiting)
Show elapsed time, cost, tokens per agent
Show compact todo list per agent (first 3 items + ”… +N more”)
Keyboard shortcut to expand agent details
Real-time updates (500ms refresh)

Communication & Context

Centralized message queue (orchestrator manages all messages)
Inter-agent messaging (sendMessage, broadcastMessage)
Shared context (working directory, findings, decisions)
Agent-to-orchestrator communication
Result sharing between agents

Teams Enforcement for Agents

Enforce allowed models per agent
Enforce allowed sub-agent roles
Forced sub-agents (e.g., security agent on every write)
Model selection per sub-agent enforcement
Nesting depth limits (enterprise policy)

Testing

Phase 11: Model Switching & Context Management

Goal: Dynamic model switching, intelligent context pruning

Model Switching (slash commands in chat)

/model <provider> - switch provider mid-conversation
/models - list available models
Context transfer when switching
Preserve conversation history
Adjust token limits per model
Check against allowed models (Teams enforcement)

Context Compaction

Smart Context Management

Relevance scoring for messages
Keep important context (system prompts, recent)
Prune low-relevance old messages
Preserve critical info (file paths, decisions)

Local Model Support

Testing

Model switching
Context compaction strategies
Various context sizes
Teams allowed models enforcement

Phase 12: Custom Commands & Checkpoints

Goal: User extensibility, code safety

Custom Slash Commands

Command file format (Markdown with frontmatter)
Load from .mimir/commands/ and ~/.mimir/commands/
Parse arguments ($1, $2, $ARGUMENTS)
Bash execution support (!command)
Register with agent
/help shows custom commands
Example commands
Permissions in Commands
- Specify required permissions
- Inherit risk level from definition
- Support auto-accept: true in frontmatter
Teams custom commands (loaded from Teams API)

Checkpoint System

Auto-create before file changes
Store git diff and file backups in .mimir/checkpoints/
CLI Commands:
- mimir checkpoint list - show all checkpoints
- mimir checkpoint restore <id> - restore checkpoint
Slash Commands:
- /checkpoint - create checkpoint now
- /undo - undo last operation
Show diff before restore
Confirm destructive operations
Auto cleanup (keep last N)

Doctor Command (CLI)

mimir doctor - run full diagnostics
- Node.js version check
- Docker installation check
- API keys configured
- File permissions
- Network connectivity
- LLM provider connection test
- MCP server health
- Teams connection test (if authenticated)
Suggest fixes
Auto-fix when possible

MIMIR.md Support

Load context from MIMIR.md (or custom file via config)
Include in system prompt
Hierarchical MIMIR.md (global + project)
Template generation with mimir init

Testing

Phase 13: Polish & Launch

Goal: Production-ready release

Code Quality

85%+ test coverage
Fix all linting errors
Address TypeScript strict mode issues
Optimize performance bottlenecks
Memory leak detection and fixes

Documentation

Examples

Error Messages & UX

Review error messages for clarity
Add helpful suggestions
Improve onboarding
Interactive setup wizard

Performance

Benchmark common operations
Optimize slow paths
Implement caching
Profile memory usage

Security Audit

User input handling
Command injection vulnerabilities
Path traversal vulnerabilities
Docker sandbox security
Permission system security
Dependency audit (npm audit)
Teams API security (auth, encryption)

Release

Version 1.0.0
Build binaries (all platforms)
Publish to npm
GitHub release

Success Metrics

MVP (Phase 1-9)

v1.0 (Phase 1-13)

Implementation Priority

Config/Teams Foundation (Phase 2) - Prepare for enterprise features
Tool System (Phase 4) - Core functionality for agents
Agent Orchestration (Phase 10) - Enable multi-agent workflows
Polish remaining phases (5-9, 11-13) - Complete MVP features

Architecture Plan Agent Orchestration