THE /goal COMMAND
One command, eight modules, ~25 minutes. Set a finish line, walk away — and understand what's running under the hood.
Why a whole class on one command
/goal looks like syntactic sugar — type a condition, Claude keeps working until the condition holds — but the architectural move underneath is significant: the agent doing the work is no longer the agent deciding it's done. A second, smaller model judges the condition every turn. Get the condition right and you've turned a chat loop into a contract.
What you'll learn
- The exact syntax, including aliases and the four-character-thousand condition budget.
- When
/goalbeats/loopand a Stop hook — and when it doesn't. - The three ingredients of a condition that survives 20+ turns.
- How the evaluator actually decides "done", and what it can never see.
- Headless
/goalin CI, plus the anti-patterns that burn money silently.
Built against
Claude Code v2.1.143. The command itself shipped in v2.1.139 (May 2026). Trust dialog must be accepted in the workspace, since /goal is a session-scoped wrapper around the hooks system.
What /goal Is
foundationsTL;DR — /goal <condition> sets a session-scoped completion condition. After every turn, a small fast model judges whether the condition holds. If not, Claude takes another turn — automatically, no "keep going" prompt from you.
▸ The three uses, same command
/goal <condition>— set (or replace) the active goal. A turn starts immediately, condition itself as the directive./goal(no args) — show status: condition, runtime, turns evaluated, token spend, evaluator's last reason./goal clear— remove the active goal. Aliases:stop · off · reset · none · cancel.
▸ The architectural insight
That separation is what makes the command pay off. The worker stays optimistic ("almost done"); the evaluator is skeptical ("you said tests pass — show me"). Two roles, two models, one loop.
▸ A first goal
/goal all tests in test/auth pass and the lint step is clean While the goal is active a ◎ /goal active indicator shows how long it's been running. The condition can be up to 4,000 characters — long enough to spell out constraints.
/goal vs /loop vs Stop Hook vs Auto Mode
foundationsTL;DR — four mechanisms keep a session running between prompts. Pick by what should start the next turn and who decides it's done.
▸ The four-way comparison
▸ When to reach for each
- /goal — substantial work with a verifiable end state, where the evaluator can read the proof in the transcript. Migration until tests pass. Splitting a file until each piece is under a size budget.
- /loop — schedule-driven polling. "Check the deploy every 5 min." "Babysit a PR queue." Not a completion condition; a re-run cadence.
- Stop hook — when you want a deterministic check (shell command, exit code) or custom prompt logic that lives in settings and applies session-wide.
- Auto mode — pair with any of the above when you don't want per-tool permission prompts. Critical for unattended use; orthogonal to "when does the loop stop".
Anatomy of a Good Condition
craftTL;DR — a condition that survives 20+ turns has three ingredients: one measurable end state, a stated check Claude can run, and constraints that name what must not change.
▸ Ingredient 1 — Measurable end state
An observable quantity that flips from "not yet" to "yes" exactly once. Not a vibe. Examples that work: a test run's exit code, a file count, a queue length, "git status is clean". Examples that fail: "the code is good", "users will be happy", "production-ready".
▸ Ingredient 2 — Stated check
How Claude should prove the end state. The evaluator reads the transcript; if proof isn't in the transcript, the goal stalls. Spell out the command Claude should run: "npm test exits 0", "jq . config.json succeeds", "the build/ dir contains index.html".
▸ Ingredient 3 — Constraints
What must not change while reaching the end state. Without these the worker may "fix" tests by deleting them or downgrade a dep instead of fixing the bug.
/goal all tests in test/auth pass (npm test --silent exits 0),
the diff touches only src/auth/* and test/auth/*,
and package.json is unchanged. ▸ The 4,000-char budget
The condition can be long — use it. A two-paragraph condition with constraints typically outperforms a five-word one because the evaluator has more cues to judge against. Past ~1,000 chars the worker also reads it as a small spec.
Deep dive → Master Class §4 Permissions (the safety pairing)Good vs Bad — A Gallery
craftTL;DR — read these side by side. The pattern reveals itself faster than any rule.
▸ Tests
✓ /goal all tests in test/auth pass (npm test --silent exits 0)
✗ /goal fix the tests ▸ Migration
✓ /goal every call site of db.queryOld has been replaced with db.queryV2,
bun tsc --noEmit exits 0, no .ts file outside src/db touches db.queryOld
✗ /goal migrate the database calls ▸ Cleanup
✓ /goal CHANGELOG.md has one entry per PR merged in the last 7 days
(verified by gh pr list --state merged --search "merged:>=2026-05-10");
each entry has a date, type, and link.
✗ /goal update the changelog ▸ Refactor
✓ /goal src/parser.ts has been split so every resulting file is ≤200 lines,
all callers compile, bun test exits 0, no new dependencies in package.json
✗ /goal break the parser into smaller files ▸ Backlog
✓ /goal every issue with label "good-first" is either closed,
labeled "blocked", or has a comment from me explaining the next step.
Stop after 30 turns even if not done.
✗ /goal triage the issues ▸ The pattern
Every "good" example names a thing to count, a command to run, and what stays untouched. Every "bad" example names a verb and trusts the worker to define done.
Bounding the Run
craftTL;DR — there is no built-in cap on a /goal session. Add a stop clause to the condition itself, or burn money in your sleep.
▸ Built-in: nothing
A goal keeps running until the evaluator says "yes" or you run /goal clear. There is no default max-turn, no default budget cap, no default timeout.
▸ Three ways to bound
- Inline in the condition. Append
"or stop after 20 turns"or"or stop after 1 hour". The worker reports progress each turn; the evaluator reads the progress and judges the clause. - Headless caps. When running with
-p, add--max-turns Nand--max-budget-usd Xat the CLI level. These are enforced by Claude Code itself, not the evaluator. - External kill switch. Ctrl+C in interactive mode, or kill the process for headless. Crude but reliable.
▸ A bounded goal in practice
/goal flake-2-out-of-5 in test/network/* is resolved
(running each suite 5× and showing 5/5 passes in the output),
or stop after 25 turns. Constraint: don't touch test infra
unless a single test file's setup is the cause. ▸ A real reason to set a cap
Without a cap, a goal that can't converge will iterate forever, each turn costing real money. The author of the community pre-official claude-goal tool defaulted to 500 continuations as a runaway protection — which tells you how bad it can get.
How the Evaluator Works
internalsTL;DR — /goal is a wrapper around a session-scoped prompt-based Stop hook. After every turn the condition + transcript are sent to the configured small fast model (default: Haiku). It returns yes/no + a short reason. That's the whole machine.
▸ The loop, in detail
▸ What the evaluator CAN see
- Every user message and assistant message in the session transcript.
- The condition you set.
- Tool calls and their results, including stdout and exit codes that Claude already ran.
▸ What the evaluator CAN'T see
- Files on disk it doesn't already see in the transcript.
- Anything Claude didn't print. A test that passes silently doesn't count until Claude shows the exit code.
- Live state. The evaluator can't open a browser, can't hit an API, can't call tools.
▸ The cost
Evaluator tokens are billed on the small-fast-model tier (Haiku-class) and are typically negligible compared to the main turn. Concretely: a 20-turn goal session that costs $4 in worker tokens might add $0.05 in evaluator tokens.
▸ The implementation note that matters
Because /goal IS a Stop hook, it's unavailable when disableAllHooks is set or when allowManagedHooksOnly blocks it. The trust dialog must be accepted for the workspace. The command tells you why it's unavailable instead of silently doing nothing.
Headless /goal
productionTL;DR — pass /goal <condition> as the prompt to claude -p and the entire loop runs in one invocation. Pair with --max-turns + --max-budget-usd + auto mode to get an unattended agent with hard ceilings.
▸ The headless shape
claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"
--permission-mode bypassPermissions
--max-turns 30
--max-budget-usd 1.50
--output-format json ▸ A real GitHub Actions step
- name: Nightly issue triage
run: |
claude -p \
--permission-mode bypassPermissions \
--max-turns 40 \
--max-budget-usd 2.00 \
--output-format json \
"/goal every issue labeled 'needs-triage' is either
(a) labeled with one of {bug, feature, question, won't-fix},
(b) closed with a reason, or
(c) commented on with a follow-up question to the reporter.
Stop after 40 turns even if not done."
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} ▸ Resume behavior
If a session ends with a goal still active, --resume or --continue restores the condition. The turn count, timer, and token-spend baseline reset on resume. An achieved goal is not restored.
▸ The three discipline rules
- Always set
--max-turnsAND--max-budget-usd. Headless without caps is a runaway disguised as a workflow. - Treat the
--output-format jsonenvelope as a contract — fail the CI step if the JSON is malformed instead of letting downstream consume garbage. - Use
bypassPermissionsmode only in CI, never on a real workstation. CI sandboxes are ephemeral; your laptop's~isn't.
Limitations & Anti-Patterns
productionTL;DR — five failure modes burn money or land bad code. Read these before you set your first long-running goal.
▸ #1 — The condition isn't observable
"Make the app production-ready" can never be proven in a transcript. The evaluator has no notion of "ready"; it can only check what Claude printed. Symptom: every turn the evaluator says "not yet" and the loop runs forever. Fix: rewrite as a list of countable end states ("0 lint errors, 0 type errors, all tests pass, >80% line coverage measured by vitest --coverage").
▸ #2 — The worker "fixes" by deleting
"All tests pass" can be satisfied by deleting failing tests. Symptom: goal achieves in 3 turns, suite is empty. Fix: add a constraint — "no test file is deleted; git diff --name-only test/ shows only modifications, not deletions".
▸ #3 — Compound goals overwhelm the worker
"Migrate the DB AND refactor the parser AND write the docs" is too many independent end states. The worker context-switches, the evaluator gets confused by mixed signals, the loop never converges. Fix: sequence them — three goals, one at a time, set the next one when the previous achieves.
▸ #4 — No turn cap on an exploratory goal
Goals on tasks where it's unclear how many turns are needed (debugging a flake, exploring an unfamiliar codebase) easily run 100+ turns. Fix: always include "or stop after N turns" for exploratory work. You can always set another goal afterward.
▸ #5 — Trusting the achieved entry
The evaluator can be wrong. "Yes the tests pass" might mean "the last npm test command Claude ran exited 0" — but Claude may have stubbed out the failing assertion. Fix: for high-stakes goals, add an external check (a separate CI run, a code review) after the goal achieves. /goal is a strong attention focuser, not a substitute for review.
▸ When NOT to use /goal at all
- Quick edits. A two-line fix doesn't need a loop; just ask.
- Genuinely exploratory work. If you don't know what "done" looks like,
/goalisn't the tool. Try/loopwith a poll interval, or just iterate. - Work that requires human judgment per turn. Code review, design decisions, anything with subjective acceptance. The evaluator can't see your taste.