Architecture

How model slots, the orchestrator, bridges, and episodes work together.

Overview

Iron Rain is organized in three layers:

┌─────────────────────────────────────────────────┐
│  CLI / TUI           User interface layer         │
│  @howlerops/iron-rain-cli                        │
│  @howlerops/iron-rain-tui                        │
├─────────────────────────────────────────────────┤
│  Orchestrator        Dispatch + routing layer    │
│  OrchestratorKernel                              │
│  ├── ModelSlotManager                            │
│  ├── ToolRouter                                  │
│  └── SlotWorker (per slot)                       │
├─────────────────────────────────────────────────┤
│  Context             @ references + injection   │
│  parseReferences │ DispatchController            │
├─────────────────────────────────────────────────┤
│  Bridges             Provider abstraction layer  │
│  AnthropicBridge │ OllamaBridge │ OpenAICompat   │
│  ClaudeCodeBridge │ CodexBridge │ GeminiBridge   │
└─────────────────────────────────────────────────┘

Model Slots

The ModelSlotManager holds three named slots, each assigned to a provider + model pair:

Cortex (main) — Strategy, planning, high-level reasoning
Scout (explore) — Search, read, codebase understanding
Forge (execute) — Code edits, bash commands, file writes

Slots can be reassigned at runtime via slots.setSlot() or by modifying the config. The manager is the single source of truth for which model handles which category of work.

Key methods

class ModelSlotManager {
  getSlot(name: SlotName): SlotConfig;
  setSlot(name: SlotName, config: SlotConfig): void;
  cycleSlot(name: SlotName, available: SlotConfig[]): SlotConfig;
  getSlotForTool(toolType: ToolType): SlotName;
  serialize(): SlotAssignment;
}

Tool Router

The tool router automatically maps tool types to the correct slot:

Tool type	Routed to	Rationale
`edit`, `write`, `bash`	Forge	Needs precise instruction following
`grep`, `glob`, `read`, `search`	Scout	Read-only, can use a cheaper model
`strategy`, `plan`, `conversation`	Cortex	Needs the most capable model

You can override routing by setting targetSlot explicitly on an OrchestratorTask.

Orchestrator Kernel

The OrchestratorKernel is the central dispatch system. It receives tasks, routes them to the correct slot worker, and records episode summaries.

Task dispatch flow

1. Task comes in with a toolType (e.g., "edit")
2. Tool router maps "edit" → execute slot
3. Kernel finds the SlotWorker for execute
4. Worker calls the bridge (e.g., OpenAICompatBridge)
5. Bridge makes the API call and returns a BridgeResult
6. Worker wraps result into a WorkerResult
7. Kernel converts it to an EpisodeSummary and logs it

Key methods

class OrchestratorKernel {
  // Dispatch a single task to the appropriate slot
  dispatch(task: OrchestratorTask): Promise<EpisodeSummary>;

  // Dispatch multiple tasks in parallel
  orchestrate(tasks: OrchestratorTask[]): Promise<EpisodeSummary[]>;

  // Merge external episodes into the log
  integrate(episodes: EpisodeSummary[]): void;

  // Get all recorded episodes
  getEpisodes(): ReadonlyArray<EpisodeSummary>;

  // Re-initialize workers after config change
  refreshWorkers(): void;
}

Parallel orchestration

The orchestrate() method dispatches all tasks in parallel using Promise.allSettled, so tasks on different slots run concurrently. Failed tasks still produce an episode with status: 'failure'.

Bridges

Bridges are the abstraction layer between slots and providers. Every bridge implements the CLIBridge interface:

interface CLIBridge {
  name: string;
  available(): Promise<boolean>;
  execute(prompt: string, options?: BridgeOptions): Promise<BridgeResult>;
  stream(prompt: string, options?: BridgeOptions): AsyncIterable<BridgeChunk>;
}

Streaming chunks

The stream() method yields BridgeChunk objects with structured tool call data:

interface BridgeChunk {
  type: "text" | "thinking" | "tool_use" | "error" | "done";
  content: string;
  tokens?: { input: number; output: number };
  toolCall?: { id: string; name: string; status: "start" | "end" };
}

Tool call chunks carry enriched display names (e.g. Read schema.ts instead of just Read) derived from tool input arguments like file_path, pattern, or command. The pipeline flows: Bridge → SlotWorker.stream() → OrchestratorKernel.dispatchStreaming() → DispatchController → UI signals.

Bridge types

Bridge	Type	How it works
`AnthropicBridge`	API	Direct HTTP to Anthropic Messages API
`OpenAICompatBridge`	API	HTTP to any OpenAI-compatible endpoint
`OllamaBridge`	API	HTTP to local Ollama server
`GeminiBridge`	API	HTTP to Google Generative Language API
`ClaudeCodeBridge`	CLI	Spawns `claude -p` subprocess
`CodexBridge`	CLI	Spawns `codex exec` subprocess
`GeminiCLIBridge`	CLI	Spawns `gemini -p` subprocess

Episodes

Every task dispatch produces an EpisodeSummary — the unit of observability in Iron Rain.

interface EpisodeSummary {
  id: string;            // Unique episode ID
  slot: SlotName;        // Which slot ran this task
  task: string;          // The original prompt
  result: string;        // The model's response
  tokens: number;        // Total tokens (input + output)
  duration: number;      // Time in milliseconds
  filesModified?: string[];  // Files changed
  status: 'success' | 'failure' | 'partial';
}

Episodes accumulate in the kernel and can be inspected for debugging, cost tracking, or fed into context management systems.

Context References

The context reference system lets users inject external context into prompts using @ prefixes. This is handled by the parseReferences() function in the core package.

┌─────────────────────────────────────────────────┐
│  User Input                                      │
│  "@./file.ts @git:diff explain changes"          │
├─────────────────────────────────────────────────┤
│  parseReferences()                               │
│  ├── Detect slot routing (@cortex, @scout...)    │
│  ├── Resolve file refs → read + wrap in tags     │
│  ├── Resolve dir refs → listing                  │
│  ├── Resolve git refs → exec whitelisted cmds    │
│  └── Resolve image refs → base64 encode          │
├─────────────────────────────────────────────────┤
│  DispatchController.buildTask()                  │
│  Injects resolved content into system prompt     │
└─────────────────────────────────────────────────┘

Reference types

Type	Syntax	Resolved as
File	`@./path` or `@file:path`	File contents in `<file>` tags (100KB max)
Directory	`@dir:path`	Directory listing in `<directory>` tags
Git	`@git:cmd`	Git command output in `<git>` tags (5s timeout)
Image	`@image:path` or `@./img.png`	Base64 data for multimodal models (20MB max)

Multimodal Support

Bridges support multimodal ChatMessage content. The MessageContent type is a union of string and Array<TextContent | ImageContent>. Image references are encoded as base64 and passed through to providers that support vision (Anthropic, OpenAI, Gemini). Other providers receive the text-only portion via the getTextContent() helper.

Planner

The planner subsystem (packages/core/src/planner/) provides two high-level execution patterns:

Plan Generation & Execution

1. /plan "Add auth" → PlanGenerator
2. Cortex generates PRD (system prompt pass)
3. Cortex breaks PRD into tasks (JSON array)
4. User reviews: approve / reject / edit
5. PlanExecutor runs tasks sequentially via Forge
6. Each task gets prior results as context
7. Optional auto-commit after each task

Plans are stored in .iron-rain/plans/<id>/ and can be paused, resumed, or listed.

Ralph Loop (iterative execution)

The RalphLoop runs a task repeatedly until a completion condition is met. Each iteration includes prior actions as context, enabling progressive refinement. If stuck for 3+ iterations, it auto-suggests a different approach.

1. /loop "Fix tests" --until "ALL TESTS PASSING"
2. Iteration 1: Forge attempts fix → check condition
3. Iteration 2: Forge sees prior result → tries again
4. ... until condition is met or max iterations reached

Context Compaction (RLM)

Long conversations are automatically managed using a Retrieval-augmented Language Model pattern:

Hot window — last 6 messages kept verbatim
Cold archive — older messages summarized into a compact form
RLM retrieval — keyword-based relevance scoring pulls relevant old messages back into context

Compaction triggers when the message count exceeds 8 (configurable). Token estimation uses ~4 chars = 1 token.

Error Handling & Resilience

All bridges use a shared error handling layer:

BridgeError — typed errors with status code and provider name
Exponential backoff — with jitter (50-100% of computed delay), retries on 429 and 5xx errors
Circuit breaker — opens after 5 consecutive failures, auto-resets after 60 seconds

Skills System

Skills are markdown files with YAML frontmatter that extend Iron Rain with custom slash commands.

Auto-discovery — scans .iron-rain/skills/, ~/.iron-rain/skills/, and .claude/skills/
Execution — skill instructions are injected into the system prompt for the dispatch
Registration — skills appear in the slash command menu and /skills list

Session Persistence

The TUI uses a SQLite database (~/.iron-rain/sessions.db) for persistence:

Sessions — stored with model, timestamps, and message history
Lessons — persistent cross-session memory with optional TTL and full-text search
Activities — subagent activities per message (slot, task, status, duration, tokens)

Falls back to a no-op NullSessionDB in non-Bun environments.

Mid-Stream Context Injection

The DispatchController supports pausing a streaming response to inject additional user context:

User submits text while the agent is streaming
injectAndContinue() aborts the current stream
The partial response is saved as an assistant message
A continuation prompt is dispatched with the full updated history

This enables iterative refinement without losing the agent's partial work.

Subagent Display

When tasks run across multiple slots, the TUI shows a responsive SubagentGrid — a flex-wrap grid of cards showing:

Status dot (running/done/error/interrupted) with slot label
Task title and tool-call tree with checkmark/error markers
Footer with duration, token count, and cost

Cards adapt to terminal width via flexGrow + minWidth.

Streaming Agent Card

During active streaming, a StreamingAgentCard replaces the simple spinner with a live-updating bordered card that shows:

Animated spinner with slot label and truncated task description
Tool call tree with ├/└ connectors and ✓/✗/→ status indicators
Lifecycle phases: "System prompt loaded", "Thinking..." entries
Truncated content preview as text streams in
Live elapsed timer with cancel/inject hints

The DispatchController accumulates tool calls from tool_use chunks during streaming and copies them into the final SlotActivity.toolCalls for the completed message display.

Packages

Package	Purpose
`@howlerops/iron-rain`	Core library — slots, orchestrator, bridges, context references, config. Zero UI deps.
`@howlerops/iron-rain-tui`	Terminal UI components built with SolidJS — session view, subagent grid, settings
`@howlerops/iron-rain-cli`	CLI binary — launches TUI or runs headless
`@howlerops/iron-rain-plugin`	Plugin SDK — hooks and type definitions

Providers

CLI Reference