▮ CLAUDE-CORE · ACADEMYCLAUDE CODE v2.1.143CCROUTER · MODEL LAYER ▮
ROUTE THEMODEL LAYER
Use Claude Code's UX with any model — without losing the loop. Four modules of theory, eight copy-paste routing configs. ~25 minutes.
Why this class exists
Claude Code talks to one model by default — Anthropic's. But the most expensive turn in your session (a file walk, a grep result summary, a comment cleanup) doesn't need Opus-level reasoning. CCRouter sits between your CC client and any provider, routes each request to the model that fits the work, and stays invisible to the agent loop. The cost asymmetry alone funds the setup. The model diversity unlocks workflows the single-vendor path can't reach.
What you'll learn
What CCRouter actually intercepts and why it's safe.
The six routes (default, background, think, longContext, webSearch, image) and which trigger when.
The minimal config.json that gets you off Anthropic billing for routine work.
Claude Code v2.1.143 · @musistudio/claude-code-routerv1.0.x. CCRouter is a third-party open-source project (26.4k stars as of 2026-05) and moves fast — config shapes here are validated against the canonical docs but verify the field names against your installed version before copy-pasting.
§1
Why Route?
foundations
TL;DR — three pressures push you off the single-vendor default: cost asymmetry (Opus pricing on routine work), model diversity (Gemini's long context, DeepSeek's reasoning-per-dollar, local Ollama for privacy), and vendor lock-off (keep CC's UX, pick the model). CCRouter is the cheapest way to address all three at once.
▸ Cost asymmetry — the math nobody runs
A typical Claude Code session mixes a few heavy reasoning turns with many cheap ones — file reads, directory walks, summaries of tool output. Treating every turn as Opus-class is paying Opus prices for Haiku-class work. The community measurement for heavy CC users routinely lands in the 50–99% savings range when routine traffic is offloaded — exact number depends on your read/write ratio.
▸ Model diversity — capabilities Anthropic doesn't ship
Gemini 2.5 Pro — 2M token context. Useful when you need to reason over a whole monorepo, a long log, or a giant transcript.
DeepSeek V3 / R1 — high reasoning quality at a fraction of Opus pricing. Good fit for "think" routes.
Ollama (local) — runs offline, never sends bytes off your machine. The privacy/regulated-industry play.
Groq / SiliconFlow / Together — very low latency for the "default" path when responsiveness matters more than top-end quality.
▸ Vendor lock-off — same UX, any model
CC's value is the loop, the permission model, the hook system, the slash commands — not the model. CCRouter lets you keep all of that while picking the inference layer per task. The official position is "Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic" — same UX, your model decisions.
▸ When NOT to install CCRouter
If your session is small, your spend is <$20/month, or you've never thought about it — don't bother. CCRouter is operations surface area; the right time to pick it up is when your CC bill is large enough that setup cost < one month's savings. For most hobby users, the answer is "no". For users running headless /goal sessions in CI nightly, the answer is "probably yes".
TL;DR — CCRouter is a local proxy on 127.0.0.1:3456. It speaks Anthropic's API shape on the client side and any provider's API shape on the server side. CC thinks it's talking to Anthropic; the provider thinks it's talking to its native client. Transformers do the translation.
Every POST /v1/messages CC makes — the full prompt, system message, tool definitions.
Token counts (for the longContextThreshold decision).
Tool calls and their arguments — for transformers that need to rewrite tool shapes per provider.
▸ What CCRouter doesn't see
Your filesystem. CC reads files; CCRouter only sees what CC has already serialised into the request.
Hooks. They run inside CC's process, before/after the request boundary CCRouter intercepts.
Your Anthropic API key — when routing to a non-Anthropic provider, the Anthropic key is never sent.
▸ Transformers — the glue between API shapes
Each provider has its own quirks: DeepSeek returns a reasoning_content field, Gemini wants safetySettings, OpenRouter expects an HTTP-Referer header. Built-in transformers (deepseek, gemini, openrouter, groq, maxtoken, tooluse, reasoning, sampling, enhancetool, cleancache, vertex-gemini) handle the well-known cases. Custom transformers handle the rest — see recipe R8.
▸ Why this is safe
The proxy binds to 127.0.0.1, not 0.0.0.0 — no external traffic can reach it. The optional top-level APIKEY field requires clients (including CC) to send a matching Authorization header, defending against other local processes hitting the endpoint. The NO_PROXY=127.0.0.1 env var that ccr activate sets prevents your corporate HTTP proxy from snooping localhost traffic.
TL;DR — npm install -g @musistudio/claude-code-router, write a minimal ~/.claude-code-router/config.json, launch with ccr code instead of claude. Five minutes from zero to first routed turn.
ccr code — starts the router AND launches Claude Code in one go. The simplest path.
ccr start then claude — run the proxy as a separate process, use the normal claude binary. Useful when several CC instances share one router.
eval "$(ccr activate)" — exports ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, NO_PROXY into your shell. After this, any claude invocation in that shell uses the router until you exit.
▸ Verify it's working
# in another terminal, with CCR running
curl -s http://127.0.0.1:3456/health
# → {"ok":true}
# in your CC session
/cost
# → look at the model field; if it says "deepseek-chat" instead of an
# Anthropic model name, you're routed.
▸ Other useful CLI commands
ccr ui — web config editor at http://127.0.0.1:3456/ui. Easier than hand-editing JSON.
ccr model — interactive model picker for the current session (overrides Router.default).
TL;DR — CCRouter classifies every incoming request into one of six routes and maps the route to a provider+model. default is the fallback; the other five trigger on observable request features.
▸ The six routes — when each fires
default every request that doesn't match a more specific route
background sub-agent dispatches; bulk reads/writes; non-interactive work
think requests where the system prompt or tools imply reasoning depth
longContext prompt+history token count > longContextThreshold (default 60000)
webSearch request uses the WebSearch tool
image request contains image content (beta)
Routes are evaluated in order of specificity. longContext beats default;
webSearch beats default. Specifics beat generics — exactly one route fires
per request.
Read it as a cascade: anything heavy/reasoning stays on premium models; routine and bulk work goes to the cheap path; long-context goes to the model with the biggest window.
▸ The longContextThreshold field
A numeric field on the Router object — the token count above which the longContext route fires. Default 60000. Drop it lower (40000) if you want to bias toward the long-context model earlier, raise it higher if Gemini calls are eating your budget on borderline-size requests.
The name field is what you reference in the Router block (e.g. "longContext": "gemini,gemini-2.5-pro" — the comma separates provider name from model name).
▸ Eleven built-in transformers
deepseek, gemini, openrouter, groq, maxtoken (cap output tokens), tooluse (rewrite tool calls), reasoning (handle reasoning_content), sampling (alter temperature/top_p), enhancetool, cleancache, vertex-gemini. Apply at the provider level via transformer.use, or scope to specific models via transformer.<model-name>.use.
Each recipe: When · Config · Wire-up · Gotcha. All configs are fragments — drop them into the right slot in your config.json.
R1
Send background tasks to a cheap model
cost
When — sub-agent dispatches, indexing passes, bulk file reads — work that's necessary but doesn't need Opus-level reasoning. Keep the parent agent on premium, send everything background-tagged to DeepSeek (or another cheap-and-good model).
Drop into ~/.claude-code-router/config.json. Launch with ccr code. CC's normal sub-agent dispatches now run on DeepSeek — no change to your Agent tool calls.
Gotcha — DeepSeek's tool-use behaviour differs from Anthropic's; the deepseek transformer handles the common cases but if a sub-agent expects very strict JSON output you may need the tooluse transformer too: "transformer": { "use": ["deepseek", "tooluse"] }.
R2
Route long-context to Gemini 2.5 Pro
capability
When — you're reasoning across a whole monorepo, a multi-thousand-line log, or a giant transcript. Anthropic's window is wide but Gemini 2.5 Pro at 2M tokens is wider. Route only the requests that need it.
The threshold is a count of incoming tokens (prompt + history). Once you cross it, every turn for the rest of the session likely stays above — so Gemini stays in play. Drop the threshold to 40000 if you want earlier handoff; raise to 120000 if you only want true repo-wide tasks routing out.
Gotcha — Gemini's safety filters can refuse content that Anthropic accepts (e.g. security research, certain code samples). The gemini transformer disables overly aggressive defaults, but production code touching anything adversarial — auth tests, fuzzing — may bounce. Keep a fallback chain (recipe R5).
R3
Local-first via Ollama (offline / privacy)
privacy
When — air-gapped work, regulated industry (health, defense, finance with client data), offline travel, or just preference: no tokens leave your machine. Ollama runs an OpenAI-compatible server locally; CCRouter routes to it like any other provider.
▸ Prerequisites
# install ollama + pull a coding model
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:32b # or deepseek-coder-v2, llama3.3:70b, etc.
Ollama needs to be running before ccr start (or ccr code). On macOS, the brew formula installs a launchd service; on Linux you'll want a systemd unit. Confirm it's up with curl localhost:11434/api/tags.
Gotcha — local models trail frontier models substantially on tool-use and reasoning. Expect more "I'll need to think about this" loops, occasional malformed tool calls, and slower throughput. A 32B model on a Mac M-series can deliver 20–40 tokens/sec; smaller models are faster but worse. The privacy/cost win is real, the quality gap is also real.
R4
Sub-agents to Haiku, parent to Opus
cost
When — your parent agent dispatches many sub-agents (research, file walks, validation) and each sub-agent only needs to follow a tight handoff. Run the parent on Opus for orchestration; run sub-agents on Haiku. Same vendor, dramatic cost split.
Sub-agent dispatches are classified as background automatically by CC's request shape (they carry a different system prompt and have parent_tool_use_id). No config needed on CC's side — the parent stays Opus, dispatches transparently downgrade.
Gotcha — if you've enforced a 5-field delegation handoff (see cookbook R6), Haiku is fine. If your handoffs are loose, Haiku will fill in the gaps with guesses that Opus wouldn't have made. The discipline at the dispatch boundary is what makes this safe.
R5
Failover chain: Anthropic → OpenRouter
reliability
When — production headless workloads where a single provider outage shouldn't kill your /goal session. CCRouter doesn't have a built-in chain operator, but OpenRouter is itself a meta-provider with fallback semantics — chain at the OpenRouter level, point CC's secondary route there.
When direct Anthropic is up, your default path hits it directly (cheaper, lower latency). If it fails (rare), you can switch "default" to the OpenRouter route with one config edit + ccr stop && ccr start (or use ccr ui for a live edit). True automatic failover requires a custom router script — out of scope for this recipe.
Gotcha — OpenRouter adds its own margin (~5%) on top of provider pricing, and adds one network hop. Don't make it your primary path unless you specifically value the chain semantics. It's an insurance layer, not a daily driver.
R6
Daily budget cap with API_TIMEOUT_MS
cost
When — you're running unattended /goal sessions and a runaway loop could burn a week's budget overnight. CCRouter doesn't have a built-in $-budget enforcer, but combining API_TIMEOUT_MS with claude -p --max-budget-usd at the CC level gives you two ceilings.
# headless /goal with budget AND per-request timeout
ccr start &
claude -p \
--permission-mode bypassPermissions \
--max-turns 40 \
--max-budget-usd 2.00 \
--output-format json \
"/goal every issue labeled 'needs-triage' is processed. Stop after 40 turns."
Gotcha — API_TIMEOUT_MS in CCRouter only caps a single request. A runaway 100-turn loop can still rack spend if each turn is fast. The --max-budget-usd flag on claude -p is the real hard cap; treat API_TIMEOUT_MS as a "don't hang on a flaky provider" guard, not a budget primitive.
R7
Per-project routing override
scope
When — most projects want the cheap-default chain, but one specific project (regulated data, a high-stakes refactor) needs to stay on Anthropic for everything. Switch CCR off for that one workspace without uninstalling.
▸ Approach 1 — workspace-scoped env
# in the project's .envrc (direnv) or shell rc
unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN
# in this workspace, claude talks directly to Anthropic
▸ Approach 2 — distinct configs by path
# launch CCR with a project-specific config
CCR_CONFIG_PATH=~/.claude-code-router/configs/regulated.json ccr code
# (or symlink ~/.claude-code-router/config.json before each session)
▸ Wire-up — direnv example
# .envrc in the high-stakes project
export ANTHROPIC_API_KEY="sk-ant-..." # direct
unset ANTHROPIC_BASE_URL # so claude doesn't hit CCR
unset ANTHROPIC_AUTH_TOKEN
# all other projects keep ccr activate active
Gotcha — there's no first-class "per-project config" in CCRouter today; the workspace-scoped env approach is the pragmatic answer. If you switch often, keep two named configs (config-cheap.json, config-direct.json) and symlink — easier than remembering to unset/reset env every time.
R8
Custom transformer — strip reasoning tokens
compat
When — you're routing to a reasoning model (DeepSeek R1, o3-style) and the reasoning_content output is leaking into CC's transcript view, eating context. A custom transformer can strip it server-side before CC sees the response.
// ~/.claude-code-router/plugins/strip-reasoning.js
module.exports = {
name: "strip-reasoning",
transformResponse(response, _request, _options) {
if (response?.choices) {
for (const c of response.choices) {
if (c.message && c.message.reasoning_content) {
delete c.message.reasoning_content;
}
}
}
return response;
}
};
▸ Wire-up
Restart CCR after editing (ccr stop && ccr start). Test by issuing a request that hits the think route and inspecting CC's transcript — the reasoning chain should be gone, only the answer remains.
Gotcha — stripping reasoning helps context budget but you lose visibility into why the model made a decision. For high-stakes work, log reasoning_content to a file in the transformer (don't just delete) so post-mortems are possible. The built-in reasoning transformer handles the official cases; this one is for fine control.