Brain OS 🧠

Stop giving your AI amnesia.

Brain OS is a biologically-inspired, central cognitive engine written in pure Rust. Instead of every script, coding assistant, and chat UI keeping its own isolated, fragmented context, Brain OS acts as your single source of truth.

It routes intents through a Thalamus, scores importance via an Amygdala, and stores everything in a unified Hippocampus (FTS5 + HNSW Vector Search). Whether you connect via HTTP, WebSocket, gRPC, or MCP, your AI tools now share one localized, ever-growing memory that runs 24/7 on your machine.

Your data never leaves your hardware. Your AI never forgets.

How It Works

Every input — regardless of protocol — flows through the same pipeline:

Input → Intent Classification → Importance Scoring → Memory Store/Recall → LLM Response

The memory engine combines vector search (HNSW) with full-text search (BM25 FTS5), fuses results via Reciprocal Rank Fusion, and reranks by importance and recency. A forgetting curve runs every 24 hours to prune low-value memories and promote reinforced episodes to permanent semantic facts.

Beyond memory: the kernel it grew into

Memory is the hook — but the same daemon also mediates what your AI tools can do. Every capability it exposes — search the web, run a sandboxed command, send a notification, probe a host, audit its own config — is a typed entry in one capability manifest, each tagged with a safety tier and routed through the same consent, audit, and budget gates. Whether a request comes from your terminal, an MCP client, or Brain’s own resident reasoner, it sees the same manifest and is held to the same rules.

Design Principles

Principle	Description
Local-first	Runs on your machine. No cloud, no telemetry, no account.
Protocol-agnostic	HTTP, WebSocket, gRPC, MCP — one memory behind every surface.
Memory that earns its place	Importance scoring + forgetting curve keep the signal sharp.
Open to any LLM	Ollama, OpenAI, OpenRouter, or any OpenAI-compatible endpoint.
Fail safe, never silently	Degraded-but-functional is the target state.

Installation

Requirements

Ollama (or any OpenAI-compatible API)
Rust 1.91+ (only for building from source)
Docker (optional — for SearXNG web search backend)

From crates.io (recommended)

cargo install brainos          # requires Rust 1.91+
brain init                     # creates ~/.brain/ with config, database, vector index
ollama pull qwen2.5-coder:7b
ollama pull nomic-embed-text
brain deps up                  # optional: upgrade web search from DuckDuckGo (default) to SearXNG

From source

git clone https://github.com/keshavashiya/brain.git && cd brain
cargo install --path crates/cli
brain init

One-liner (pre-built binary)

curl -fsSL https://raw.githubusercontent.com/keshavashiya/brain/main/scripts/install.sh | sh

This downloads a pre-built binary when available, falling back to cargo install from source.

External services & auto-start

Docker (optional web search):

brain deps up       # Start SearXNG
brain deps status   # Check if running
brain deps down     # Stop

Auto-start on login:

brain service install    # launchd (macOS) / systemd (Linux) / Task Scheduler (Windows)
brain service uninstall  # Remove

Verify your install

brain doctor   # verify Ollama, models, ports — fix anything red
brain start    # wake the daemon
brain status   # check daemon health

Quick Start

Recommended setup order

# 1. Initialize (one-time)
brain init

# 2. Quick test — direct daemon
brain start
brain status
brain stop

# 3. Production — auto-start on login
brain service install    # registers launchd/systemd/Task Scheduler
# Brain now wakes automatically on every login

Lifecycle commands

brain start    # Start daemon
brain stop     # Stop daemon
brain status   # Check daemon status
brain tail     # Stream BrainEvent bus (observability tap for headless/SSH)

Interactive usage

brain chat                           # Interactive chat
brain chat "remember that I use bun" # One-shot message

Foreground mode (development)

brain serve               # All adapters (foreground)
brain serve --http        # HTTP only
brain serve --http --ws   # HTTP + WebSocket
brain serve --grpc        # gRPC only
brain serve --mcp         # MCP HTTP only

Checking memory

# Search memory
brain chat "what do I know about Rust?"

# Store a fact
brain chat "remember that my favorite editor is Neovim"

# List grants
brain chat "show me my grants"

Architecture Overview

Brain OS is built as a collection of specialized crates, each modelling a biological brain structure. The system is organized around a single SignalProcessor that all adapters share.

Crate Map

brain/
├── crates/
│   ├── core/           # BrainConfig + shared config types
│   ├── signal/         # SignalProcessor — the single shared engine
│   ├── thalamus/       # Intent classification (regex + LLM fallback)
│   ├── amygdala/       # Importance scoring
│   ├── hippocampus/    # Memory engine (episodic + semantic + search)
│   ├── cortex/         # Reasoning core (LLM, context, action dispatch)
│   ├── cerebellum/     # Procedural memory (trigger→patterns)
│   ├── ganglia/        # Proactivity / habit engine
│   ├── audit/          # Append-only audit trail
│   ├── confirm/        # Confirmation engine (nonce-based)
│   ├── budget/         # Cost/token budget enforcement
│   ├── sandbox/        # Command execution sandbox
│   ├── vault/          # Credential vault
│   ├── orchestrate/    # Task decomposition + execution
│   ├── delegate/       # External agent delegation
│   ├── channel/        # Channel routing + presets
│   ├── observe/        # Observability bus + BrainEvent
│   ├── identity/       # Principal, tier, authorization
│   ├── intent/         # Intent Token + capability routing
│   ├── mcphost/        # MCP host for external servers
│   ├── reflex/         # Reactive signal sources
│   ├── resilience/     # Circuit breaker, retry, rate limit
│   ├── storage/        # SQLite pool + migrations
│   ├── backends/       # World-touching backends
│   ├── selfmodel/      # Self-model (host, capability, connectivity)
│   ├── metrics/        # Performance metrics
│   ├── bridge/         # Bridge library for external relays
│   ├── adapters/       # Transport adapters (HTTP, WS, gRPC, MCP, Terminal)
│   └── cli/            # CLI binary (thin wrapper over backends)
├── docs/               # Documentation (mdBook)
├── scripts/            # Build + release scripts
└── docker/             # Docker compose for SearXNG

Design Principle: One Capability, Many Faces

A capability is a typed entry — id, safety tier, preconditions — in a single registry. All transports (CLI, HTTP, WS, gRPC, MCP) and the resident reasoner are faces over that one registry. They hold no private capabilities and no business logic.

Signal Pipeline

Every input to Brain flows through a single pipeline:

Input → Intent Classification → Authorization → Importance Scoring
    → Memory Store/Recall → LLM Response → Output

Processing stages

Signal Ingestion — signals arrive via any adapter (HTTP, WS, gRPC, MCP, CLI) as a typed Signal carrying content, namespace, principal, and metadata.
Intent Classification — the Thalamus classifies each signal into one of 31 intent variants using a regex fast-path with async LLM fallback and timeout.
Authorization — the IdentityStore enforces tier-based authorization on every signal. The pipeline gate runs after classification, checking the principal’s rights against the required tier.
Importance Scoring — the Amygdala scores memories on a [0,1] scale using keyword heuristics + per-process novelty detection. No LLM cost.
Memory — the Hippocampus handles storage and recall, combining BM25 FTS5 full-text search with HNSW vector search, fused via Reciprocal Rank Fusion.
LLM Response — the Cortex builds a token-budgeted prompt, invokes the configured LLM provider (with failover chain), and streams the response.

The Capability Loop

For autonomous actions (tool calls), Brain runs a consent-gated tool loop:

LLM Response → Tool Call Request → Authorization → Confirmation
    → Execution → Audit → Result → LLM (next turn)

Each tool is a registered capability with a safety tier. Destructive/external actions require user confirmation via nonce-based approval flow.

Memory Model

Brain stores three kinds of memory, mirroring human memory structure:

Type	What it stores	Storage
Episodic	Timestamped conversation history	SQLite + FTS5
Semantic	Subject–predicate–object facts	SQLite + HNSW vector
Procedural	Trigger → action patterns	SQLite

Retrieval

Memory retrieval is hybrid — combining vector similarity (HNSW) with keyword matching (BM25 FTS5), fused via Reciprocal Rank Fusion (RRF). Results are reranked by importance and recency before being used as LLM context.

Forgetting Curve

retention = importance × e^(-decay_rate × hours_since_last_access)

High-importance, frequently-accessed memories persist indefinitely. Low-importance, stale memories decay and are pruned during nightly consolidation.

Namespaces

Memory can be scoped to a namespace (default: "personal"). Namespaces allow project-specific facts, clean separation of domains (work, personal, codebase), and residency policies (local_only vs any).

Consolidation

A background loop runs every 24 hours to:

Prune low-retention memories using the forgetting curve
Promote reinforced episodes into semantic facts (with idempotency guards)
Apply data-residency enforcement

Memory Trust

Every memory stores its source agent. Brain supports provenance-weighted recall: per-agent trust weights determine how much agent-written memories influence context assembly. Unattested agent writes land quarantined until reviewed.

Capability System

Brain’s capability system is the unified substrate for everything the system can do.

Capability Registry

All capabilities are registered in a single ToolRegistry. Three producers feed into it:

Native backends — built-in capabilities declared by each backend module
MCP mounts — external MCP servers registered at runtime
Skill packs — (future) declarative capability bundles

Every capability has:

ID — unique identifier
Safety tier — Read / Write / Execute / Destructive / External
Preconditions — what must be true for it to work
When-to-use — guidance for the LLM
Embedding — semantic descriptor for hybrid retrieval

Capability Discovery

The IntentRouter and CapabilityIndex route intents to registered capabilities using hybrid (cosine + keyword) scoring. A learned CapabilityFitnessStore tracks per-tool success/failure and provides a bounded tiebreaker in ranking.

Runtime Health

Capabilities carry runtime health state:

Verified — working normally
Degraded — dependency unavailable (e.g., embedder down for memory.store)
BreakerOpen — circuit breaker tripped
PreconditionFailed — missing prerequisite

The capability digest renders all health state in chat, and tools/list annotates per-tool status.

Intent Taxonomy

Brain exposes 31 intent variants covering all user-facing actions. Intents are classified by the Thalamus using a regex fast-path with LLM fallback.

Intent Categories

Category	Intents
Inspection	`Recall`, `MemorySummary`, `SystemStatus`, `ProactivityStatus`, `BudgetStatus`, `List`, `TaskStatus`, `QueryAgents`, `QueryAudit`, `ChannelPreferences`
Memory	`StoreFact`, `Forget`
Action	`ExecuteCommand`, `WebSearch`, `SendMessage`, `DelegateTask`
Lifecycle	`Schedule`, `DecomposeTask`, `Cancel`, `OpenTerminalSession`, `CloseTerminalSession`, `MountMcpServer`, `UnmountMcpServer`, `ReconsentMcpServer`
Governance	`RespondToApproval`, `ApproveMemoryWriter`, `PruneAudit`, `SetChannelPreference`, `SetProactivity`
Capability	`ToolCall`
Conversation	`Chat`

Standardized Intent Tokens (SIT)

Every intent can be expressed as a typed IntentToken — a JSON object with a verb and optional object/modifiers. This enables programmatic intent dispatch alongside natural language classification.

Verb Vocabulary

The verb vocabulary is a compile-time constant set, cross-checked by tests that ensure every Intent variant resolves through the registry and matches its declared tier hint. Adding a verb requires code changes to the intent enum, auth mapping, handler, and tier hint — so a TOML registry would add friction without flexibility.

HTTP API

Brain exposes a REST API on port 19789 (default) for all operations.

Authentication

All endpoints (except /health) require an API key via the Authorization: Bearer <key> header. The key is generated at brain init and stored in ~/.brain/config.yaml.

Endpoints

Health

GET /health

Returns 200 OK when the daemon is running.

Signals

POST /v1/signals
Content-Type: application/json
Authorization: Bearer <api_key>

{
  "content": "Remember I like Rust",
  "namespace": "personal",
  "source": "curl",
  "channel": "http",
  "sender": "tester"
}

Memory

POST /v1/memory/search
Authorization: Bearer <api_key>

{
  "query": "Rust",
  "namespace": "personal",
  "top_k": 5
}

POST /v1/memory/store
Authorization: Bearer <api_key>

{
  "content": "User prefers dark mode",
  "namespace": "personal",
  "kind": "fact"
}

Webhooks

POST /v1/webhooks/:id

UI

GET /ui          # Live dashboard
GET /ui/approvals
GET /ui/memory
GET /ui/audit

Metrics

GET /metrics     # Prometheus-formatted metrics

Adapter ports

Adapter	Port	Host
HTTP	19789	127.0.0.1
WebSocket	19790	127.0.0.1
MCP HTTP	19791	127.0.0.1
gRPC	19792	127.0.0.1
Terminal	19793	127.0.0.1

WebSocket API

Brain exposes a WebSocket API on port 19790 for streaming interactions.

Connection

const ws = new WebSocket('ws://localhost:19790');

Message format

Messages are JSON with the following structure:

{
  "content": "remember my favorite editor is Neovim",
  "namespace": "personal",
  "source": "web-client"
}

The server streams responses as JSON-encoded SignalResponse messages.

gRPC API

Brain exposes a gRPC API on port 19792 for the memory service.

The gRPC adapter provides the same memory operations available via HTTP, with the efficiency of binary protobuf serialization. This is the recommended transport for programmatic clients doing high-volume memory operations.

MCP Integration

Any MCP-compatible client can connect to Brain as a stdio MCP server.

Configuration

Add to your MCP client config:

{
  "mcpServers": {
    "brain": {
      "command": "brain",
      "args": ["mcp"]
    }
  }
}

Tools exposed via MCP

Tool	Arguments	Description
`memory_search`	`query`, `top_k?`, `namespace?`	Semantic search of Brain memory (facts & episodes)
`memory_store`	`subject`, `predicate`, `object`	Store a structured semantic fact
`memory_facts`	`subject`	Retrieve all stored facts about a subject
`memory_episodes`	`limit?`, `namespace?`	Retrieve recent conversation episodes
`user_profile`	—	Retrieve user profile & Brain OS configuration
`memory_procedures`	`action`, `trigger?`, `steps?`	Manage learned procedures (list/store/delete)
`brain_capabilities`	—	List Brain’s live capability manifest

MCP Host

Brain can also mount external MCP servers as capabilities. Configure servers in ~/.brain/config.yaml:

mcphost:
  servers:
    - name: "filesystem"
      transport: "stdio"
      command: "npx"
      args: ["-y", "@modelcontextprotocol/server-filesystem"]

Mounted servers’ tools are registered in the capability manifest alongside native tools, available to every face (CLI, chat, HTTP, etc.).

Configuration

Brain’s configuration lives in ~/.brain/config.yaml, generated by brain init. Every key has a safe default, so you only set what you want to change.

Precedence (highest wins):

Env vars prefixed BRAIN_ with __ as the section separator (e.g. BRAIN_LLM__API_KEY=…, BRAIN_ADAPTERS__HTTP__PORT=8080)
~/.brain/config.yaml
Embedded defaults (crates/core/default.yaml)

This page is a complete reference to every section. Each block is independent — copy the ones you want to experiment with into your config and leave the rest out.

LLM

Brain probes each provider entry at startup, picks the first reachable one, and fails over to the next on rate-limit or error.

llm:
  temperature: 0.7
  max_tokens: 4096
  # The active model's input context window, in tokens. Drives how much
  # file/attachment + memory content the prompt assembler packs in. Raise to
  # your model's real size (e.g. 32768, 128000) so large-window models read in
  # detail instead of clipping to the conservative 8k default.
  context_window: 8192

  providers:
    - name: ollama
      kind: ollama                 # ollama | groq | openai | openrouter | deepseek | together | gemini-compat
      base_url: "http://localhost:11434"
      model: "qwen2.5-coder:7b"
      preferred_models: ["qwen2.5-coder:7b", "llama3.1:8b"]
    # - name: groq
    #   kind: groq
    #   api_key: "gsk_..."
    #   model: "llama-3.3-70b-versatile"
    #   preferred_models: ["llama-3.3-70b-versatile", "llama-3.1-8b-instant"]
    # - name: openrouter
    #   kind: openrouter
    #   api_key: "sk-or-..."
    #   model: "meta-llama/llama-3.1-8b-instruct:free"

  # Legacy single-provider fallback — only used when `providers` is empty.
  provider: "ollama"
  model: "qwen2.5-coder:7b"
  base_url: "http://localhost:11434"
  api_key: ""

Model tiers (per-task routing)

Each tier is an ordered failover chain of provider names from the list above. Kernel chores (classification fallback, importance scoring, history compaction, web-search synthesis, background nudges) use fast; chat and task decomposition use deep; everything else uses balanced.

llm:
  tiers:
    fast: ["ollama"]               # keep chores fully local
    deep: ["openrouter", "ollama"] # cloud for chat, local fallback

An empty or omitted tier aliases the default chain, so leaving the block out changes nothing. Putting a local provider in fast guarantees those chores never leave your machine even when chat rides a cloud provider. Unknown provider names fail closed at startup.

Embedding

Run ollama pull nomic-embed-text before first start. dimensions must match the model’s actual output size exactly.

embedding:
  model: "nomic-embed-text"
  dimensions: 768

Memory

memory:
  semantic:
    similarity_threshold: 0.65     # min cosine similarity for a semantic hit
    max_results: 20
  search:
    rrf_k: 60                      # Reciprocal Rank Fusion constant
    pre_fusion_limit: 50           # candidates from each source (BM25, ANN) before fusion
    importance_weight: 0.3         # weight for importance in final reranking
    recency_weight: 0.2            # weight for recency in final reranking
    decay_rate: 0.01               # forgetting-curve decay (higher = faster forgetting)
  consolidation:
    enabled: true
    interval_hours: 24
    forgetting_threshold: 0.05     # memories below this strength are dropped

Namespaces (data residency)

A namespace marked local_only never reaches a non-local provider: its memories are withheld from prompts bound for remote LLMs, embedded only by a loopback embedder, and marked in exports. An entry also covers its name/… sub-namespaces. Store into one with namespace: private on any client, or via a transport’s namespace setting.

memory:
  namespaces:
    private:
      residency: local_only        # any | local_only
    # work:
    #   residency: any

Per-agent memory trust

Recall scoring multiplies each memory’s score by the trust weight [0–1] of the agent that wrote it, so a low-trust agent’s memory cannot dominate context assembly no matter how its content is crafted. Memories from your own input always weigh 1.0.

memory:
  trust:
    default_agent_trust: 1.0
    # agents:
    #   some-external-agent: 0.4

Encryption

At-rest encryption of the local stores. Run brain init --encrypt to generate a salt and enable it.

encryption:
  enabled: false

Security

The sandbox that governs Brain’s own command execution and filesystem reads.

security:
  # Binaries the sandbox may execute. Intentionally narrow — read-only
  # inspection plus the toolchain. To run anything else (docker, brew, ssh,
  # custom scripts), add it here explicitly.
  exec_allowlist:
    - ls
    - cat
    - grep
    - git
    - cargo
    # `sh` enables the shell-wrapped tier for commands with pipes/redirects.
    # When used via that tier the per-binary allowlist is bypassed for the
    # wrapped command; rlimits, network deny, timeout, and the forbidden list
    # still apply.
    - sh
  exec_timeout_seconds: 30
  # Roots that read-only filesystem inspection may touch. Empty defaults to
  # $HOME. Set explicit entries like ["~/code", "~/work"] to restrict further.
  # Paths outside any allowed root (after canonicalization) are rejected.
  allowed_paths: []

Actions

What Brain is allowed to do in the world. Most actions are off by default.

actions:
  web_search:
    enabled: true
    provider: "duckduckgo"         # duckduckgo | searxng | tavily | custom
    endpoint: "http://localhost:8888"  # searxng/custom only
    api_key: ""                    # required for tavily
    timeout_ms: 3000
    default_top_k: 5

  scheduling:
    enabled: false                 # WRITE axis: lets Brain create/persist scheduled
                                   # intents. Firing them is the FIRE axis — see
                                   # `reflex.cron` below. Both are required to run.
    mode: "persist_only"

  messaging:
    enabled: false
    timeout_ms: 3000
    channels: {}
    # Webhook channels work for Discord, Telegram, Slack, or any HTTP endpoint.
    # Template vars: {{channel}} {{recipient}} {{content}} {{namespace}} {{timestamp}}
    #
    #   discord:
    #     url: "https://discord.com/api/webhooks/<ID>/<TOKEN>"
    #     body: '{"content": "{{content}}"}'
    #     headers: {}
    #   telegram:
    #     url: "https://api.telegram.org/bot<TOKEN>/sendMessage"
    #     body: '{"chat_id": "<CHAT_ID>", "text": "{{content}}", "parse_mode": "Markdown"}'
    #     headers: {}

  resilience:
    max_retries: 2
    retry_base_ms: 500
    circuit_breaker_threshold: 5
    circuit_breaker_cooldown_secs: 60

  # How many of a task plan's independent ready steps the orchestrator runs at
  # once. 1 = strictly sequential; higher exploits parallel branches in the
  # plan's dependency graph. Approval prompts are always resolved one at a time
  # — only auto-approved/approved actions overlap.
  max_parallel_steps: 4

Proactivity

Whether and how Brain reaches out to you on its own.

proactivity:
  enabled: true
  max_per_day: 2
  min_interval_minutes: 60
  quiet_hours:
    start: "20:00"
    end: "10:00"
    timezone: "UTC"                # IANA timezone, e.g. "America/New_York"
  delivery:
    outbox: true
    broadcast: true
    webhook_channels: []           # channel keys from actions.messaging.channels
    max_outbox_age_days: 7
  open_loop:                       # detect unresolved threads and follow up
    enabled: true
    scan_window_hours: 72
    resolution_window_hours: 24
    check_interval_minutes: 120
  discovery:                       # gentle, slow-cadence suggestions
    enabled: true
    interval_hours: 24
    unused_capabilities: true      # "did you know Brain can…" for unused capabilities
    mcp_servers: true              # propose MCP servers from other tools' configs

Discovery is a gentle companion behaviour with two independent loops, both on a slow cadence (a day by default), gated by the proactivity toggle and quiet hours, delivered as ordinary low-priority nudges, each item suggested at most once:

unused_capabilities — finds an authored, user-facing capability with no recorded use yet (learned fitness) and surfaces one as a “did you know Brain can…” suggestion, so a faculty you never knew about doesn’t stay invisible. It declines to suggest anything when learned fitness is disabled (without that signal “unused” can’t be told from “untracked”).
mcp_servers — reads other MCP clients’ config files on this machine (Claude Desktop, Cursor, Windsurf) and proposes mounting any MCP server Brain doesn’t already run, as a copy-paste /mcp-mount command. The scan is read-only and local; mounting stays a consented action with its own egress scopes.

Adapters

The network surfaces Brain exposes. Disable any you don’t use.

adapters:
  http:     { enabled: true, host: "127.0.0.1", port: 19789, cors: true }
  ws:       { enabled: true, port: 19790 }
  mcp:      { enabled: true, port: 19791 }
  grpc:     { enabled: true, port: 19792 }
  terminal: { enabled: true, port: 19793 }

Reflex (reactive signal sources)

Default is empty — no reflex tasks spawn unless configured here. Each firing becomes a Signal and flows through the normal pipeline (identity, confirmation, dispatch).

reflex:
  fs: []                           # filesystem watchers, one entry per path set
  # fs:
  #   - name: project-watch
  #     paths: ["~/notes", "~/projects"]
  #     recursive: true
  #     debounce_ms: 200

  cron:
    enabled: false                 # FIRE axis: fires due scheduled_intents through
                                   # the pipeline. Required for actions.scheduling
                                   # intents to ever run.
    poll_interval_seconds: 60

  sys:
    enabled: false                 # edge-triggered system state
    poll_interval_seconds: 30
    rules: []
    # Kinds (all edge-triggered — they fire on a transition, not a level):
    #   battery_below (needs `threshold`), on_ac_changed — platform power source
    #     (pmset on macOS, /sys on Linux).
    #   network_changed — the kernel's online/offline view; needs
    #     `monitoring.connectivity` enabled with targets, else it never flips.
    #   lock_changed — systemd-logind (`LockedHint`) on Linux, CoreGraphics
    #     (`CGSSessionScreenIsLocked`) on macOS; inert where no GUI session is
    #     reachable (headless / ssh).
    # rules:
    #   - kind: battery_below
    #     threshold: 20
    #   - kind: on_ac_changed
    #   - kind: network_changed
    #   - kind: lock_changed

Logging

Drives the tracing subscriber. RUST_LOG still overrides the computed filter at runtime. Long-running services (serve, mcp) log to a rotating file at ~/.brain/logs/brain.log; one-shot commands log to stderr.

logging:
  level: "info"                    # base level for the `brain` target
  format: "pretty"                 # "pretty" (human) or "json" (structured)
  rotation: "daily"                # "daily" | "hourly" | "never"
  targets: {}                      # per-subsystem overrides
  # targets:
  #   hippocampus: "debug"
  #   signal: "info"

Learning (capability fitness)

Brain records whether each tool succeeds or fails, decays those observations under the forgetting curve, and uses them as a tie-breaker when ranking the tools it offers the chat model. Awareness only — execution stays consent-gated.

learning:
  capability_fitness:
    enabled: true
    half_life_days: 30             # how long an observation keeps half its weight

Brain also learns whether its answers helped. It classifies each turn into a coarse task kind, judges how your next message reacted to the previous answer (off the hot path), and reinforces a per-(task-kind, model) quality score on the same forgetting curve. When more than one model is configured across llm.tiers, a model that measurably answers a kind worse than a cheaper tier with its own evidence loses that kind’s turns to it — bounded by an evidence floor and a margin, and never escaping your configured tiers. A single-model install is unaffected.

learning:
  answer_fitness:
    enabled: true
    half_life_days: 30             # how long a judged outcome keeps half its weight
    min_judged_turns: 8            # evidence (per tier) required before routing shifts
    margin: 0.15                   # success-ratio lead a cheaper tier needs to win a kind

Observability

A background task samples process RSS, CPU, open SQLite connections, and ~/.brain disk usage; crossing a ceiling emits a ResourcePressure event (visible in brain tail, brain doctor --deep, and /status). Ceilings are generous and fail-safe — set any threshold to 0 to disable it.

observability:
  resource_sample_secs: 30
  thresholds:
    rss_mb: 2048                   # resident-set-size ceiling (MiB)
    cpu_pct: 90.0                  # process CPU ceiling (% single-core basis)
    disk_mb: 10240                 # ~/.brain disk-usage ceiling (MiB)
    open_fds: 1024                 # open file-descriptor ceiling (fd-leak warning)
  log_sampling:
    high_volume_1_in_n: 1          # emit 1 in N high-volume log lines; 1 = log all

Monitoring

External service health

Each entry spawns one bounded probe loop (HTTP GET or raw TCP connect). Probes are edge-triggered: a notification fires only when a service crosses between reachable and unreachable, never once per interval. Empty by default.

monitoring:
  services: []
  # - name: ollama
  #   kind: http                   # http | tcp
  #   target: "http://localhost:11434/api/tags"
  #   interval_secs: 60
  #   timeout_secs: 10
  #   expect_status: 200           # http only; omit to accept any 2xx
  # - name: postgres
  #   kind: tcp
  #   target: "127.0.0.1:5432"
  #   interval_secs: 30

Connectivity

The kernel’s online / degraded / offline view. Targets default to the configured remote LLM provider endpoints, so probing never adds a new egress destination; a fully-local install has nothing to probe and stays online. While offline, chat rides the first fully-local model tier and web search degrades with an honest explanation instead of timing out.

monitoring:
  connectivity:
    enabled: true
    interval_secs: 60
    timeout_secs: 5                # per-target TCP-connect timeout
    targets: []                    # host:port overrides; empty = derive from llm.providers

Power

The kernel’s external / battery view, read from the platform (pmset on macOS, /sys/class/power_supply on Linux; no network). While on battery, heavy background maintenance holds until external power returns. Desktops and platforms without a readable power source stay pinned external.

monitoring:
  power:
    enabled: true
    interval_secs: 60
    defer_maintenance: true        # hold consolidation/sweeps while on battery

Manifest health

A periodic sweep that stamps each registered capability verified / degraded / breaker-open by probing what it depends on (the embedding model, network connectivity, per-tool circuit breakers). The capability digest and tools/list annotate unhealthy tools so the reasoner never promises a faculty that is dead right now.

monitoring:
  manifest_health:
    enabled: true
    interval_secs: 120

Per-turn telemetry

After each chat turn the pipeline publishes one turn_completed event on the observability bus, summarising what the turn cost: the serving model and its locality, the kernel’s connectivity at the time, prompt/completion token usage, the number of model⇄tool rounds and calls dispatched, and wall-clock latency. It is pure observation — nothing about how a turn runs changes — and makes each turn legible to brain events --kind turn_completed and the trust console. With no observability bus wired (CLI one-shots) nothing is emitted.

monitoring:
  telemetry:
    enabled: true

Learned-normal monitoring

Alongside the static resource ceilings, the daemon learns each runtime gauge’s normal range — an exponentially-weighted moving baseline of its mean and variance — and emits a metric_anomaly event when a reading lands far outside that learned band. This catches a gauge climbing abnormally fast while still under its configured ceiling (an early warning a fixed threshold can’t give), and stays quiet on a machine whose normal load is simply high. It is edge-triggered (one alert per excursion) and silent until it has seen warmup_samples readings, so the minutes after boot never alarm. Read the signal with brain events --kind metric_anomaly.

The same detector also watches the per-turn telemetry stream (see Per-turn telemetry above): it learns a normal turn’s latency and token cost and raises a metric_anomaly (turn.latency_ms / turn.tokens) when a turn falls far outside that learned band — catching “your turns are suddenly much slower than usual” or a one-off token blowout. The same learned_normal settings govern it.

monitoring:
  learned_normal:
    enabled: true
    sensitivity: 4.0      # learned standard deviations out before it's an anomaly
    warmup_samples: 30    # samples observed before any anomaly can fire
    alpha: 0.1            # EWMA smoothing factor — larger adapts faster to recent readings

Channel (relays & transports)

Bidirectional gateways that connect Brain to chat platforms. Unlike one-shot actions.messaging webhooks, these are long-lived connections — approval responses from any channel are correlated automatically.

channel:
  # Long-lived WebSocket gateways.
  relays: []
  # - id: telegram
  #   label: "Telegram"
  #   url: "ws://127.0.0.1:7000/brain"
  #   namespace: "personal"
  #   api_key: ""
  #   initial_backoff_ms: 1000
  #   max_backoff_ms: 60000

  # Generic preset-driven transports (http_polled / webhook_inbound /
  # webhook_outbound). Each names a preset that ships embedded under
  # crates/channel/presets/ (discord, slack, telegram) or lives at
  # ~/.brain/presets/<id>.yaml.
  transports: []
  # - id: chat-main
  #   label: "Telegram"
  #   preset: telegram
  #   namespace: personal
  #   credential: "<bot-token-or-webhook-url>"   # plugged into the preset's templates
  #   signing_secret: "<hmac-or-pubkey>"         # webhook_inbound presets only

Agents (delegation)

Specialist CLI agents the orchestrator hands multi-step work to. Auto-discovery (on by default) finds well-known agents on $PATH without manual entries; use delegates for bespoke binaries.

agents:
  delegates: []
  fallbacks: []
  retry_on_timeout: true
  auto_discovery: true             # find claude_code, aider, cursor, … on $PATH
  # delegates:
  #   - name: script
  #     kind: subprocess
  #     binary: "/usr/local/bin/my-agent"
  #     args: ["--task", "{task_id}"]
  #     prompt_via_stdin: true
  #     tags: ["custom"]
  #
  # Per-agent overrides for the auto-discovered registry, keyed by canonical id
  # (claude_code, aider, cursor, …). Every field is optional.
  # discovery_overrides:
  #   claude_code:
  #     binary: "/opt/homebrew/bin/claude"
  #     args: ["--print", "--task", "{task_id}"]
  #     prompt_via_stdin: true
  #     disabled: false
  #     capabilities:
  #       tags: ["code-edit", "plan", "rust"]
  #       languages: ["rust", "typescript"]
  #       max_concurrency: 2
  #       needs_network: true

Access (API keys & rate limiting)

A random key is generated on brain init and printed once to stdout.

access:
  api_keys: []
  # - key: "brk_..."                  # `brain init` generates one in this format
  #   name: "laptop"
  #   permissions: ["read", "write"]   # read | write | export | admin
  #   agent_id: "my-laptop"            # binds the key to an identity principal
  rate_limit:
    enabled: true
    tokens_per_refill: 60
    refill_interval_ms: 60000
    burst_capacity: 20

Scopes do not imply each other — write does not grant read; list both if needed. admin is an implicit superset of every scope.

Confirm (standing approvals)

Pre-authorize specific (agent, verb) pairs so they don’t prompt for human confirmation every time. Loaded at startup into the standing-approval store and idempotent across launches (an existing active grant for the same triple is left alone).

confirm:
  standing_approvals: []
  # - agent_id: "my-laptop"
  #   verb_ns: "net"
  #   verb_action: "http"
  #   note: "trusted automation"

Identity (principals & authorization)

Binds requests to a principal and constrains what each may do. Default is empty — signals carry no principal and the identity gate is silently skipped.

Risk tiers, ordered by escalating risk: read < write < execute < destructive < external. Only destructive and external block on explicit human approval.

identity:
  user_id: ""
  principals: []
  # - agent_id: "my-laptop"
  #   tier: execute               # read | write | execute | destructive | external
  #   scopes: ["memory", "net"]
  #   # Path prefixes the principal may read/write. Empty = no path-scoped ops.
  #   path_allowlist: ["~/code", "~/work"]
  #   # General per-(verb, modifier) allowlists. Empty = unconstrained beyond
  #   # tier/scope/path. Opt-in.
  #   constraints:
  #     - verb: "net.http"         # exact, "net.*" wildcard, or "*"
  #       modifier: "host"         # request modifier key, e.g. host / command
  #       match_kind: host_suffix  # exact | prefix | host_suffix
  #       allow: ["example.com"]   # empty = deny everything for this (verb, modifier)

Storage

Internal defaults — safe to leave unchanged.

brain:
  version: "0.5.0"
  data_dir: "~/.brain"

storage:
  ruvector_path: "~/.brain/ruvector/"
  sqlite_path: "~/.brain/db/brain.db"
  hnsw:
    ef_construction: 200
    m: 16
    ef_search: 50
    # HNSW pre-allocates the index graph for max_elements up-front, so this is
    # a real memory cost. 100k covers personal-scale installs; raise to
    # 1_000_000+ for a team or large corpus.
    max_elements: 100000

Running & Deployment

Daemon lifecycle

brain start      # Start (or via service if installed)
brain stop       # Stop
brain status     # Health check
brain tail       # Stream events (observability)

Service management

brain service install    # launchd (macOS) / systemd (Linux) / Task Scheduler (Windows)
brain service uninstall  # Remove auto-start

Docker

Brain can run with Docker for the optional SearXNG web search backend:

brain deps up       # Start SearXNG
brain deps status   # Check
brain deps down     # Stop

Data layout

Paths created at brain init:

~/.brain/config.yaml — user config
~/.brain/db/brain.db — SQLite database
~/.brain/ruvector/ — HNSW vector store
~/.brain/vault/ — encrypted credentials
~/.brain/logs/brain.log

Encryption

Enable encryption-at-rest with brain init --encrypt. This uses AES-256-GCM with Argon2id key derivation. Encrypted exports are the default when at-rest encryption is enabled.

Security

Brain’s security model is built on layered guarantees.

Authentication

Every API request requires an API key (generated at brain init). Keys can be scoped to specific agents with limited permissions.

Authorization tiers

All capabilities are tagged with a safety tier:

Tier	Examples	Requires confirmation?
Read	Memory search, status, audit query	No
Write	Store fact, set preference	No
Execute	Run command, web search	Yes (nonce-based)
Destructive	Delete memory, prune audit	Yes + budget check
External	Send message, delegate task	Yes + cost check

Confirmation engine

Destructive and external actions require a nonce-based approval flow. The engine supports:

Standing approvals (with optional TTL and scope)
Confirmation timeouts (pauses when user is away)
Cross-channel confirmation correlation

Audit trail

Every action is recorded in an append-only SQLite audit trail with immutable triggers. The audit covers who did what, when, and the authorization decision.

Sandbox

Command execution runs in a sandbox with:

Process-group SIGKILL on timeout
Binary allowlist
rlimits (CPU, address space, file count, file size)
macOS sandbox-exec / Linux unshare network isolation

Data residency

Namespaces can be marked local_only, preventing their data from reaching any non-local LLM provider. Enforcement happens at every egress point — recall, embedding, export.

Credential vault

Secrets are stored in the OS-native keychain (macOS Keychain, Linux Secret Service) with an encrypted-file fallback (Argon2id + AES-256-GCM).

Export & Import

Export

brain export                    # Plaintext JSON export
brain export --encrypt          # Encrypted envelope (default when at-rest encryption is on)
brain export --output file.json

The export envelope includes:

All episodic and semantic memories
Procedural memory (trigger patterns)
Memory namespaces and residency markers
Config metadata (no secrets)

Encrypted exports are self-contained envelopes using AES-256-GCM with a fresh Argon2id-derived key and embedded salt — portable across machines.

Import

brain import file.json          # Import plaintext
brain import file.enc           # Import encrypted (auto-detected)

Imports preserve namespace boundaries. Encrypted imports prompt for passphrase.

Contributing

We welcome contributions! The project is organized as a Rust workspace with 29 crates.

Getting started

git clone https://github.com/keshavashiya/brain.git
cd brain
cargo build --workspace
cargo test --workspace

Development tools

just build       # Build workspace (debug)
just test        # Run all tests
just ci          # fmt + clippy + tests
just fmt         # Format code
just lint        # Clippy
just serve-dev   # Start with debug logging

PR checklist

cargo fmt --all --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace
One intent per PR
Conventional commits (feat:, fix:, docs:, etc.)

Key conventions

Single-word crate names matching folder names
brainos- package prefix for crates.io
Every capability lives in its backend crate, not in cli
Operator commands (init, doctor, service, vault, config) stay CLI-only
CI parity enforced before every push

Documentation

Public docs are at keshavashiya.github.io/brain
Root ARCHITECTURE.md covers the high-level design
CHANGELOG.md tracks user-facing changes

Codebase Conventions

Crate naming

Layer	Rule	Example
Folder (`crates/<name>/`)	Single word, lowercase	`crates/mcphost/`
Package name	`brainos-<word>`	`brainos-mcphost`
Workspace alias	Single word, matches folder	`mcphost = { workspace = true }`
Rust import	`brainos_<word>`	`use brainos_mcphost::RmcpHost;`

Workspace deps

All internal crates declared in root [workspace.dependencies]
Consumers always use <name> = { workspace = true }
Workspace version locked across all crates

Comments + docs

No internal labels (Phase / PR references) in source code
No stale feature references in docs
PR memos and commit messages may use internal labels freely

CI parity

Every push runs: cargo fmt --all --check + cargo clippy --workspace --all-targets -- -D warnings + cargo test --workspace + cargo check --workspace --no-default-features.

Release Process

Versioning

Brain follows semantic versioning. The workspace version is locked across all 29 crates.

Release pipeline

Releases are driven by scripts/release.sh (local) and .github/workflows/release.yml (CI):

Local (human-driven)

scripts/release.sh X.Y.Z    # Validate → CI check → publish → tag
scripts/release.sh X.Y.Z --dry-run  # Dry run (no publish)
scripts/release.sh X.Y.Z --skip-ci  # Skip CI check (re-runs)

Steps:

Validates clean tree, version match, populated CHANGELOG
Runs CI parity (fmt + clippy + tests)
Publishes all crates in dependency order via scripts/publish-order.sh
Creates annotated vX.Y.Z tag and pushes

CI (automated off the pushed tag)

Triggered by pushing a v* tag:

Builds brain-<target>.tar.gz + .sha256 for macOS/Linux (x86_64 + aarch64)
Generates SPDX SBOM
Creates GitHub Release with binaries + checksums

Changelog

Every release requires an updated CHANGELOG.md with the [X.Y.Z] section populated. Extract release notes with:

scripts/changelog-extract.sh X.Y.Z

Product Vision

Brain OS is a biologically-inspired, central cognitive engine — the persistent memory and mediation layer between AI tools and the user’s world.

The vision in one sentence

Your AI tools share one localized, ever-growing memory that runs 24/7 on your machine — and they all play by the same rules.

Core principles

Local-first, always — Your data never leaves your hardware. There is no account, no cloud, no telemetry.
One capability ontology — A capability is a typed entry in one registry. CLI, HTTP, MCP, and the reasoner are faces over it.
Memory that earns its place — Importance scoring, forgetting curves, and consolidation keep the signal sharp.
Fail safe, never silently — Every error path is explicit. Degraded-but-functional is the target.
Open to any LLM — Ollama, OpenAI, OpenRouter, or any OpenAI-compatible endpoint.

Version arc

Version	Focus	Status
v0.1.0	Memory layer (episodic + semantic + procedural)	✅ Released
v0.2.0	Autonomous agent layer (safety, isolation, orchestration, delegation)	✅ Released
v0.3.0	Natural language interface (chat, intents, approvals)	✅ Released
v0.4.0	Wire pillars + fix stubs (all 30+ crates wired)	✅ Released
v0.5.0	Structural polish (release automation, capability coherence)	✅ Released
v0.6.0	The Connector (SDKs, connector protocol, situated kernel)	🔴 In progress
v0.7.0	Skill packs (declarative capability bundles)	🔴 Planned
v0.8.0	Multi-device (cuttable)	🔴 Planned
v0.9.0	Brain Studio (trust console)	🔴 Planned
v1.0.0	Launch — the private home for your AI life	🔴 Planned

What Brain is not

A cloud service
A platform integration hub
A consumer product with a polished GUI
A replacement for tool-specific context windows

The web UI at /ui is a diagnostic tool for power users, not the primary interface.

Roadmap

This page summarizes the public-facing plan.

Active: Close the Loops

The current development focus is closing feedback loops across the system. Derived from a four-lens architecture review, the work is organized into tracks:

Track 0 — Ship & hygiene ✅

Tag and publish v0.5.0, fence tool outputs as untrusted, quarantine MCP servers on hash change, add clock to prompt, fix architecture doc drift.

Track 1 — Trustworthy substrate

Namespace data-residency enforcement, memory-trust (provenance-weighted recall + unattested-writer quarantine), grants ledger, TTL/scoped standing approvals, encrypted export, semantic capability retrieval. Pre-requisite for inviting third-party connectors and writers.

Multi-device CRDT sync (v1.1)
IDE integration (v1.1)
Dual-memory unification (v1.1)

Keyboard shortcuts

Brain OS