Claude Code · Routing Class

Why this class exists

Claude Code talks to one model by default — Anthropic's. But the most expensive turn in your session (a file walk, a grep result summary, a comment cleanup) doesn't need Opus-level reasoning. CCRouter sits between your CC client and any provider, routes each request to the model that fits the work, and stays invisible to the agent loop. The cost asymmetry alone funds the setup. The model diversity unlocks workflows the single-vendor path can't reach.

What you'll learn

What CCRouter actually intercepts and why it's safe.
The six routes (default, background, think, longContext, webSearch, image) and which trigger when.
The minimal config.json that gets you off Anthropic billing for routine work.
Eight production routing patterns: cost ceiling, failover, per-project override, custom transformer.

Built against

Claude Code v2.1.143 · @musistudio/claude-code-router v1.0.x. CCRouter is a third-party open-source project (26.4k stars as of 2026-05) and moves fast — config shapes here are validated against the canonical docs but verify the field names against your installed version before copy-pasting.

§1

Why Route?

foundations

TL;DR — three pressures push you off the single-vendor default: cost asymmetry (Opus pricing on routine work), model diversity (Gemini's long context, DeepSeek's reasoning-per-dollar, local Ollama for privacy), and vendor lock-off (keep CC's UX, pick the model). CCRouter is the cheapest way to address all three at once.

▸ Cost asymmetry — the math nobody runs

A typical Claude Code session mixes a few heavy reasoning turns with many cheap ones — file reads, directory walks, summaries of tool output. Treating every turn as Opus-class is paying Opus prices for Haiku-class work. The community measurement for heavy CC users routinely lands in the 50–99% savings range when routine traffic is offloaded — exact number depends on your read/write ratio.

▸ Model diversity — capabilities Anthropic doesn't ship

Gemini 2.5 Pro — 2M token context. Useful when you need to reason over a whole monorepo, a long log, or a giant transcript.
DeepSeek V3 / R1 — high reasoning quality at a fraction of Opus pricing. Good fit for "think" routes.
Ollama (local) — runs offline, never sends bytes off your machine. The privacy/regulated-industry play.
Groq / SiliconFlow / Together — very low latency for the "default" path when responsiveness matters more than top-end quality.

▸ Vendor lock-off — same UX, any model

CC's value is the loop, the permission model, the hook system, the slash commands — not the model. CCRouter lets you keep all of that while picking the inference layer per task. The official position is "Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic" — same UX, your model decisions.

▸ When NOT to install CCRouter

If your session is small, your spend is <$20/month, or you've never thought about it — don't bother. CCRouter is operations surface area; the right time to pick it up is when your CC bill is large enough that setup cost < one month's savings. For most hobby users, the answer is "no". For users running headless /goal sessions in CI nightly, the answer is "probably yes".

Deep dive → /goal headless (the canonical heavy-spend use case)

Next → §2 · The architecture

§2

The Architecture

internals

TL;DR — CCRouter is a local proxy on 127.0.0.1:3456. It speaks Anthropic's API shape on the client side and any provider's API shape on the server side. CC thinks it's talking to Anthropic; the provider thinks it's talking to its native client. Transformers do the translation.

▸ The request path

Claude Code (with ANTHROPIC_BASE_URL=http://127.0.0.1:3456) │ ▼ CCRouter (local proxy, your machine) │ │ 1. parse Anthropic-shape request │ 2. classify route: default | background | think | longContext | webSearch | image │ 3. look up Router[route] → provider + model │ 4. apply transformers (request side) │ ▼ Provider (OpenRouter · DeepSeek · Ollama · Gemini · Groq · …) │ ▼ response CCRouter (transformers — response side) │ ▼ Claude Code (sees Anthropic-shape response, never knew)

▸ What CCRouter sees

Every POST /v1/messages CC makes — the full prompt, system message, tool definitions.
Token counts (for the longContextThreshold decision).
Tool calls and their arguments — for transformers that need to rewrite tool shapes per provider.

▸ What CCRouter doesn't see

Your filesystem. CC reads files; CCRouter only sees what CC has already serialised into the request.
Hooks. They run inside CC's process, before/after the request boundary CCRouter intercepts.
Your Anthropic API key — when routing to a non-Anthropic provider, the Anthropic key is never sent.

▸ Transformers — the glue between API shapes

Each provider has its own quirks: DeepSeek returns a reasoning_content field, Gemini wants safetySettings, OpenRouter expects an HTTP-Referer header. Built-in transformers (deepseek, gemini, openrouter, groq, maxtoken, tooluse, reasoning, sampling, enhancetool, cleancache, vertex-gemini) handle the well-known cases. Custom transformers handle the rest — see recipe R8.

▸ Why this is safe

The proxy binds to 127.0.0.1, not 0.0.0.0 — no external traffic can reach it. The optional top-level APIKEY field requires clients (including CC) to send a matching Authorization header, defending against other local processes hitting the endpoint. The NO_PROXY=127.0.0.1 env var that ccr activate sets prevents your corporate HTTP proxy from snooping localhost traffic.

Deep dive → official CCRouter docs

Next → §3 · Install + first config

§3

Install + First Config

craft

TL;DR — npm install -g @musistudio/claude-code-router, write a minimal ~/.claude-code-router/config.json, launch with ccr code instead of claude. Five minutes from zero to first routed turn.

▸ Install

# global install
npm install -g @musistudio/claude-code-router

# verify
ccr --version

▸ Config location

The config lives at ~/.claude-code-router/config.json (note: not ~/.config/ — common typo). Create the directory if it doesn't exist.

▸ A minimal config — DeepSeek for everything

{
  "LOG": false,
  "API_TIMEOUT_MS": 600000,
  "Providers": [
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/v1/chat/completions",
      "api_key": "sk-...your-deepseek-key...",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    }
  ],
  "Router": {
    "default":     "deepseek,deepseek-chat",
    "background":  "deepseek,deepseek-chat",
    "think":       "deepseek,deepseek-reasoner",
    "longContext": "deepseek,deepseek-chat",
    "webSearch":   "deepseek,deepseek-chat"
  }
}

▸ Launch — three paths

ccr code — starts the router AND launches Claude Code in one go. The simplest path.
ccr start then claude — run the proxy as a separate process, use the normal claude binary. Useful when several CC instances share one router.
eval "$(ccr activate)" — exports ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, NO_PROXY into your shell. After this, any claude invocation in that shell uses the router until you exit.

▸ Verify it's working

# in another terminal, with CCR running
curl -s http://127.0.0.1:3456/health
# → {"ok":true}

# in your CC session
/cost
# → look at the model field; if it says "deepseek-chat" instead of an
#   Anthropic model name, you're routed.

▸ Other useful CLI commands

ccr ui — web config editor at http://127.0.0.1:3456/ui. Easier than hand-editing JSON.
ccr model — interactive model picker for the current session (overrides Router.default).
ccr stop — kill the proxy.

Deep dive → official README (install + first config)

Next → §4 · The six routes

§4

The Six Routes

craft

TL;DR — CCRouter classifies every incoming request into one of six routes and maps the route to a provider+model. default is the fallback; the other five trigger on observable request features.

▸ The six routes — when each fires

default every request that doesn't match a more specific route background sub-agent dispatches; bulk reads/writes; non-interactive work think requests where the system prompt or tools imply reasoning depth longContext prompt+history token count > longContextThreshold (default 60000) webSearch request uses the WebSearch tool image request contains image content (beta) Routes are evaluated in order of specificity. longContext beats default; webSearch beats default. Specifics beat generics — exactly one route fires per request.

▸ A multi-provider Router block

"Router": {
  "default":     "anthropic,claude-sonnet-4-6",
  "background":  "deepseek,deepseek-chat",
  "think":       "deepseek,deepseek-reasoner",
  "longContext": "gemini,gemini-2.5-pro",
  "longContextThreshold": 60000,
  "webSearch":   "openrouter,perplexity/sonar-pro",
  "image":       "anthropic,claude-sonnet-4-6"
}

Read it as a cascade: anything heavy/reasoning stays on premium models; routine and bulk work goes to the cheap path; long-context goes to the model with the biggest window.

▸ The longContextThreshold field

A numeric field on the Router object — the token count above which the longContext route fires. Default 60000. Drop it lower (40000) if you want to bias toward the long-context model earlier, raise it higher if Gemini calls are eating your budget on borderline-size requests.

▸ Provider config — anatomy

{
  "name": "gemini",                                       // unique slug
  "api_base_url": "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions",
  "api_key": "AIza...your-key...",
  "models": ["gemini-2.5-pro", "gemini-2.5-flash"],       // models you'll route to
  "transformer": { "use": ["gemini"] }                    // request/response adapter
}

The name field is what you reference in the Router block (e.g. "longContext": "gemini,gemini-2.5-pro" — the comma separates provider name from model name).

▸ Eleven built-in transformers

deepseek, gemini, openrouter, groq, maxtoken (cap output tokens), tooluse (rewrite tool calls), reasoning (handle reasoning_content), sampling (alter temperature/top_p), enhancetool, cleancache, vertex-gemini. Apply at the provider level via transformer.use, or scope to specific models via transformer.<model-name>.use.

Continue → 8 routing cookbook recipes

Send background tasks to a cheap model

cost

When — sub-agent dispatches, indexing passes, bulk file reads — work that's necessary but doesn't need Opus-level reasoning. Keep the parent agent on premium, send everything background-tagged to DeepSeek (or another cheap-and-good model).

▸ Config fragment

"Providers": [
  { "name": "anthropic", "api_base_url": "https://api.anthropic.com/v1/messages",
    "api_key": "sk-ant-...", "models": ["claude-opus-4-7", "claude-sonnet-4-6"] },
  { "name": "deepseek", "api_base_url": "https://api.deepseek.com/v1/chat/completions",
    "api_key": "sk-...", "models": ["deepseek-chat"],
    "transformer": { "use": ["deepseek"] } }
],
"Router": {
  "default":    "anthropic,claude-sonnet-4-6",
  "background": "deepseek,deepseek-chat"
}

▸ Wire-up

Drop into ~/.claude-code-router/config.json. Launch with ccr code. CC's normal sub-agent dispatches now run on DeepSeek — no change to your Agent tool calls.

Gotcha — DeepSeek's tool-use behaviour differs from Anthropic's; the deepseek transformer handles the common cases but if a sub-agent expects very strict JSON output you may need the tooluse transformer too: "transformer": { "use": ["deepseek", "tooluse"] }.

Route long-context to Gemini 2.5 Pro

capability

When — you're reasoning across a whole monorepo, a multi-thousand-line log, or a giant transcript. Anthropic's window is wide but Gemini 2.5 Pro at 2M tokens is wider. Route only the requests that need it.

▸ Config fragment

"Providers": [
  { "name": "anthropic", "api_base_url": "https://api.anthropic.com/v1/messages",
    "api_key": "sk-ant-...", "models": ["claude-sonnet-4-6"] },
  { "name": "gemini",
    "api_base_url": "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions",
    "api_key": "AIza...", "models": ["gemini-2.5-pro"],
    "transformer": { "use": ["gemini"] } }
],
"Router": {
  "default":     "anthropic,claude-sonnet-4-6",
  "longContext": "gemini,gemini-2.5-pro",
  "longContextThreshold": 60000
}

▸ Wire-up

The threshold is a count of incoming tokens (prompt + history). Once you cross it, every turn for the rest of the session likely stays above — so Gemini stays in play. Drop the threshold to 40000 if you want earlier handoff; raise to 120000 if you only want true repo-wide tasks routing out.

Gotcha — Gemini's safety filters can refuse content that Anthropic accepts (e.g. security research, certain code samples). The gemini transformer disables overly aggressive defaults, but production code touching anything adversarial — auth tests, fuzzing — may bounce. Keep a fallback chain (recipe R5).

Local-first via Ollama (offline / privacy)

privacy

When — air-gapped work, regulated industry (health, defense, finance with client data), offline travel, or just preference: no tokens leave your machine. Ollama runs an OpenAI-compatible server locally; CCRouter routes to it like any other provider.

▸ Prerequisites

# install ollama + pull a coding model
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:32b      # or deepseek-coder-v2, llama3.3:70b, etc.

▸ Config fragment

"Providers": [
  { "name": "ollama",
    "api_base_url": "http://127.0.0.1:11434/v1/chat/completions",
    "api_key": "ollama",                          // any string; ollama ignores it
    "models": ["qwen2.5-coder:32b"] }
],
"Router": {
  "default":    "ollama,qwen2.5-coder:32b",
  "background": "ollama,qwen2.5-coder:32b",
  "think":      "ollama,qwen2.5-coder:32b"
}

▸ Wire-up

Ollama needs to be running before ccr start (or ccr code). On macOS, the brew formula installs a launchd service; on Linux you'll want a systemd unit. Confirm it's up with curl localhost:11434/api/tags.

Gotcha — local models trail frontier models substantially on tool-use and reasoning. Expect more "I'll need to think about this" loops, occasional malformed tool calls, and slower throughput. A 32B model on a Mac M-series can deliver 20–40 tokens/sec; smaller models are faster but worse. The privacy/cost win is real, the quality gap is also real.

Sub-agents to Haiku, parent to Opus

cost

When — your parent agent dispatches many sub-agents (research, file walks, validation) and each sub-agent only needs to follow a tight handoff. Run the parent on Opus for orchestration; run sub-agents on Haiku. Same vendor, dramatic cost split.

▸ Config fragment

"Providers": [
  { "name": "anthropic",
    "api_base_url": "https://api.anthropic.com/v1/messages",
    "api_key": "sk-ant-...",
    "models": ["claude-opus-4-7", "claude-haiku-4-5-20251001"] }
],
"Router": {
  "default":    "anthropic,claude-opus-4-7",
  "background": "anthropic,claude-haiku-4-5-20251001"
}

▸ Wire-up

Sub-agent dispatches are classified as background automatically by CC's request shape (they carry a different system prompt and have parent_tool_use_id). No config needed on CC's side — the parent stays Opus, dispatches transparently downgrade.

Gotcha — if you've enforced a 5-field delegation handoff (see cookbook R6), Haiku is fine. If your handoffs are loose, Haiku will fill in the gaps with guesses that Opus wouldn't have made. The discipline at the dispatch boundary is what makes this safe.

Failover chain: Anthropic → OpenRouter

reliability

When — production headless workloads where a single provider outage shouldn't kill your /goal session. CCRouter doesn't have a built-in chain operator, but OpenRouter is itself a meta-provider with fallback semantics — chain at the OpenRouter level, point CC's secondary route there.

▸ Config fragment

"Providers": [
  { "name": "anthropic",
    "api_base_url": "https://api.anthropic.com/v1/messages",
    "api_key": "sk-ant-...",
    "models": ["claude-sonnet-4-6"] },
  { "name": "openrouter",
    "api_base_url": "https://openrouter.ai/api/v1/chat/completions",
    "api_key": "sk-or-...",
    "models": ["anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-2.5-pro"],
    "transformer": { "use": ["openrouter"] } }
],
"Router": {
  "default":     "anthropic,claude-sonnet-4-6",
  "longContext": "openrouter,anthropic/claude-sonnet-4.6"
}

▸ Wire-up

When direct Anthropic is up, your default path hits it directly (cheaper, lower latency). If it fails (rare), you can switch "default" to the OpenRouter route with one config edit + ccr stop && ccr start (or use ccr ui for a live edit). True automatic failover requires a custom router script — out of scope for this recipe.

Gotcha — OpenRouter adds its own margin (~5%) on top of provider pricing, and adds one network hop. Don't make it your primary path unless you specifically value the chain semantics. It's an insurance layer, not a daily driver.

Daily budget cap with API_TIMEOUT_MS

cost

When — you're running unattended /goal sessions and a runaway loop could burn a week's budget overnight. CCRouter doesn't have a built-in $-budget enforcer, but combining API_TIMEOUT_MS with claude -p --max-budget-usd at the CC level gives you two ceilings.

▸ Config fragment

{
  "API_TIMEOUT_MS": 120000,                       // 2-min ceiling per request
  "LOG": true,
  "LOG_LEVEL": "info",
  "Providers": [ /* ... */ ],
  "Router": { /* ... */ }
}

▸ Wire-up — pair with CC budget cap

# headless /goal with budget AND per-request timeout
ccr start &
claude -p \
  --permission-mode bypassPermissions \
  --max-turns 40 \
  --max-budget-usd 2.00 \
  --output-format json \
  "/goal every issue labeled 'needs-triage' is processed. Stop after 40 turns."

Gotcha — API_TIMEOUT_MS in CCRouter only caps a single request. A runaway 100-turn loop can still rack spend if each turn is fast. The --max-budget-usd flag on claude -p is the real hard cap; treat API_TIMEOUT_MS as a "don't hang on a flaky provider" guard, not a budget primitive.

Per-project routing override

scope

When — most projects want the cheap-default chain, but one specific project (regulated data, a high-stakes refactor) needs to stay on Anthropic for everything. Switch CCR off for that one workspace without uninstalling.

▸ Approach 1 — workspace-scoped env

# in the project's .envrc (direnv) or shell rc
unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN
# in this workspace, claude talks directly to Anthropic

▸ Approach 2 — distinct configs by path

# launch CCR with a project-specific config
CCR_CONFIG_PATH=~/.claude-code-router/configs/regulated.json ccr code
# (or symlink ~/.claude-code-router/config.json before each session)

▸ Wire-up — direnv example

# .envrc in the high-stakes project
export ANTHROPIC_API_KEY="sk-ant-..."   # direct
unset ANTHROPIC_BASE_URL                 # so claude doesn't hit CCR
unset ANTHROPIC_AUTH_TOKEN
# all other projects keep ccr activate active

Gotcha — there's no first-class "per-project config" in CCRouter today; the workspace-scoped env approach is the pragmatic answer. If you switch often, keep two named configs (config-cheap.json, config-direct.json) and symlink — easier than remembering to unset/reset env every time.

Custom transformer — strip reasoning tokens

compat

When — you're routing to a reasoning model (DeepSeek R1, o3-style) and the reasoning_content output is leaking into CC's transcript view, eating context. A custom transformer can strip it server-side before CC sees the response.

▸ Config fragment — register the transformer

{
  "transformers": [
    {
      "path": "~/.claude-code-router/plugins/strip-reasoning.js",
      "options": { "preserveOnError": true }
    }
  ],
  "Providers": [
    { "name": "deepseek-reasoner",
      "api_base_url": "https://api.deepseek.com/v1/chat/completions",
      "api_key": "sk-...",
      "models": ["deepseek-reasoner"],
      "transformer": { "use": ["deepseek", "strip-reasoning"] } }
  ],
  "Router": { "think": "deepseek-reasoner,deepseek-reasoner" }
}

▸ The transformer script — minimal shape

// ~/.claude-code-router/plugins/strip-reasoning.js
module.exports = {
  name: "strip-reasoning",
  transformResponse(response, _request, _options) {
    if (response?.choices) {
      for (const c of response.choices) {
        if (c.message && c.message.reasoning_content) {
          delete c.message.reasoning_content;
        }
      }
    }
    return response;
  }
};

▸ Wire-up

Restart CCR after editing (ccr stop && ccr start). Test by issuing a request that hits the think route and inspecting CC's transcript — the reasoning chain should be gone, only the answer remains.

Gotcha — stripping reasoning helps context budget but you lose visibility into why the model made a decision. For high-stakes work, log reasoning_content to a file in the transformer (don't just delete) so post-mortems are possible. The built-in reasoning transformer handles the official cases; this one is for fine control.

ROUTE THE MODEL LAYER

Why this class exists

What you'll learn

Built against

Why Route?

The Architecture

Install + First Config

The Six Routes

ROUTING RECIPES

Send background tasks to a cheap model

Route long-context to Gemini 2.5 Pro

Local-first via Ollama (offline / privacy)

Sub-agents to Haiku, parent to Opus

Failover chain: Anthropic → OpenRouter

Daily budget cap with API_TIMEOUT_MS

Per-project routing override

Custom transformer — strip reasoning tokens