Claude Code · /goal Master Class

Why a whole class on one command

/goal looks like syntactic sugar — type a condition, Claude keeps working until the condition holds — but the architectural move underneath is significant: the agent doing the work is no longer the agent deciding it's done. A second, smaller model judges the condition every turn. Get the condition right and you've turned a chat loop into a contract.

What you'll learn

The exact syntax, including aliases and the four-character-thousand condition budget.
When /goal beats /loop and a Stop hook — and when it doesn't.
The three ingredients of a condition that survives 20+ turns.
How the evaluator actually decides "done", and what it can never see.
Headless /goal in CI, plus the anti-patterns that burn money silently.

Built against

Claude Code v2.1.143. The command itself shipped in v2.1.139 (May 2026). Trust dialog must be accepted in the workspace, since /goal is a session-scoped wrapper around the hooks system.

§1

What /goal Is

foundations

TL;DR — /goal <condition> sets a session-scoped completion condition. After every turn, a small fast model judges whether the condition holds. If not, Claude takes another turn — automatically, no "keep going" prompt from you.

▸ The three uses, same command

/goal <condition> — set (or replace) the active goal. A turn starts immediately, condition itself as the directive.
/goal (no args) — show status: condition, runtime, turns evaluated, token spend, evaluator's last reason.
/goal clear — remove the active goal. Aliases: stop · off · reset · none · cancel.

▸ The architectural insight

YOU — set condition once │ ▼ WORKER (Claude main turn) — does the work │ ▼ EVALUATOR (small fast model, default Haiku) — reads transcript, decides "done?" │ ├─ yes → goal cleared, turn returned to you └─ no → reason added as guidance, worker takes another turn

That separation is what makes the command pay off. The worker stays optimistic ("almost done"); the evaluator is skeptical ("you said tests pass — show me"). Two roles, two models, one loop.

▸ A first goal

/goal all tests in test/auth pass and the lint step is clean

While the goal is active a ◎ /goal active indicator shows how long it's been running. The condition can be up to 4,000 characters — long enough to spell out constraints.

Deep dive → official /goal docs

Next → §2 · /goal vs alternatives

§2

/goal vs /loop vs Stop Hook vs Auto Mode

foundations

TL;DR — four mechanisms keep a session running between prompts. Pick by what should start the next turn and who decides it's done.

▸ The four-way comparison

/goal next turn: previous turn finishes · stops: evaluator model says "done" /loop next turn: time interval elapses · stops: you stop OR Claude judges done Stop hook next turn: previous turn finishes · stops: YOUR script/prompt decides Auto mode next turn: never (single turn) · stops: Claude judges work done /goal and Stop hook both fire after every turn. /goal is session-scoped sugar on top of a prompt-based Stop hook. Auto mode is orthogonal — removes per-tool prompts, doesn't start new turns. Combine: /goal + auto mode = unattended loop.

▸ When to reach for each

/goal — substantial work with a verifiable end state, where the evaluator can read the proof in the transcript. Migration until tests pass. Splitting a file until each piece is under a size budget.
/loop — schedule-driven polling. "Check the deploy every 5 min." "Babysit a PR queue." Not a completion condition; a re-run cadence.
Stop hook — when you want a deterministic check (shell command, exit code) or custom prompt logic that lives in settings and applies session-wide.
Auto mode — pair with any of the above when you don't want per-tool permission prompts. Critical for unattended use; orthogonal to "when does the loop stop".

Deep dive → scheduling options (Anthropic docs)

Next → §3 · Anatomy of a good condition

§3

Anatomy of a Good Condition

craft

TL;DR — a condition that survives 20+ turns has three ingredients: one measurable end state, a stated check Claude can run, and constraints that name what must not change.

▸ Ingredient 1 — Measurable end state

An observable quantity that flips from "not yet" to "yes" exactly once. Not a vibe. Examples that work: a test run's exit code, a file count, a queue length, "git status is clean". Examples that fail: "the code is good", "users will be happy", "production-ready".

▸ Ingredient 2 — Stated check

How Claude should prove the end state. The evaluator reads the transcript; if proof isn't in the transcript, the goal stalls. Spell out the command Claude should run: "npm test exits 0", "jq . config.json succeeds", "the build/ dir contains index.html".

▸ Ingredient 3 — Constraints

What must not change while reaching the end state. Without these the worker may "fix" tests by deleting them or downgrade a dep instead of fixing the bug.

/goal all tests in test/auth pass (npm test --silent exits 0),
the diff touches only src/auth/* and test/auth/*,
and package.json is unchanged.

▸ The 4,000-char budget

The condition can be long — use it. A two-paragraph condition with constraints typically outperforms a five-word one because the evaluator has more cues to judge against. Past ~1,000 chars the worker also reads it as a small spec.

Deep dive → Master Class §4 Permissions (the safety pairing)

Next → §4 · Good vs Bad gallery

§4

Good vs Bad — A Gallery

craft

TL;DR — read these side by side. The pattern reveals itself faster than any rule.

▸ Tests

✓ /goal all tests in test/auth pass (npm test --silent exits 0)
✗ /goal fix the tests

▸ Migration

✓ /goal every call site of db.queryOld has been replaced with db.queryV2,
   bun tsc --noEmit exits 0, no .ts file outside src/db touches db.queryOld
✗ /goal migrate the database calls

▸ Cleanup

✓ /goal CHANGELOG.md has one entry per PR merged in the last 7 days
   (verified by gh pr list --state merged --search "merged:>=2026-05-10");
   each entry has a date, type, and link.
✗ /goal update the changelog

▸ Refactor

✓ /goal src/parser.ts has been split so every resulting file is ≤200 lines,
   all callers compile, bun test exits 0, no new dependencies in package.json
✗ /goal break the parser into smaller files

▸ Backlog

✓ /goal every issue with label "good-first" is either closed,
   labeled "blocked", or has a comment from me explaining the next step.
   Stop after 30 turns even if not done.
✗ /goal triage the issues

▸ The pattern

Every "good" example names a thing to count, a command to run, and what stays untouched. Every "bad" example names a verb and trusts the worker to define done.

Next → §5 · Bounding the run

§5

Bounding the Run

craft

TL;DR — there is no built-in cap on a /goal session. Add a stop clause to the condition itself, or burn money in your sleep.

▸ Built-in: nothing

A goal keeps running until the evaluator says "yes" or you run /goal clear. There is no default max-turn, no default budget cap, no default timeout.

▸ Three ways to bound

Inline in the condition. Append "or stop after 20 turns" or "or stop after 1 hour". The worker reports progress each turn; the evaluator reads the progress and judges the clause.
Headless caps. When running with -p, add --max-turns N and --max-budget-usd X at the CLI level. These are enforced by Claude Code itself, not the evaluator.
External kill switch. Ctrl+C in interactive mode, or kill the process for headless. Crude but reliable.

▸ A bounded goal in practice

/goal flake-2-out-of-5 in test/network/* is resolved
(running each suite 5× and showing 5/5 passes in the output),
or stop after 25 turns. Constraint: don't touch test infra
unless a single test file's setup is the cause.

▸ A real reason to set a cap

Without a cap, a goal that can't converge will iterate forever, each turn costing real money. The author of the community pre-official claude-goal tool defaulted to 500 continuations as a runaway protection — which tells you how bad it can get.

Deep dive → Master Class §9 Cost & Model Fit

Next → §6 · How the evaluator works

§6

How the Evaluator Works

internals

TL;DR — /goal is a wrapper around a session-scoped prompt-based Stop hook. After every turn the condition + transcript are sent to the configured small fast model (default: Haiku). It returns yes/no + a short reason. That's the whole machine.

▸ The loop, in detail

turn N finishes │ ▼ Stop hook fires (session-scoped, installed by /goal) │ ▼ build payload: {condition, transcript-so-far} │ ▼ send to small fast model (provider's "haiku" tier by default) │ ▼ parse response: {decision: "yes" | "no", reason: "..."} │ ├─ yes → goal cleared, achieved entry written to transcript, control returns └─ no → reason injected as guidance, turn N+1 starts immediately

▸ What the evaluator CAN see

Every user message and assistant message in the session transcript.
The condition you set.
Tool calls and their results, including stdout and exit codes that Claude already ran.

▸ What the evaluator CAN'T see

Files on disk it doesn't already see in the transcript.
Anything Claude didn't print. A test that passes silently doesn't count until Claude shows the exit code.
Live state. The evaluator can't open a browser, can't hit an API, can't call tools.

▸ The cost

Evaluator tokens are billed on the small-fast-model tier (Haiku-class) and are typically negligible compared to the main turn. Concretely: a 20-turn goal session that costs $4 in worker tokens might add $0.05 in evaluator tokens.

▸ The implementation note that matters

Because /goal IS a Stop hook, it's unavailable when disableAllHooks is set or when allowManagedHooksOnly blocks it. The trust dialog must be accepted for the workspace. The command tells you why it's unavailable instead of silently doing nothing.

Deep dive → Hook Cookbook §2 Stop hook recipe

Next → §7 · Headless /goal

§7

Headless /goal

production

TL;DR — pass /goal <condition> as the prompt to claude -p and the entire loop runs in one invocation. Pair with --max-turns + --max-budget-usd + auto mode to get an unattended agent with hard ceilings.

▸ The headless shape

claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"
        --permission-mode bypassPermissions
        --max-turns 30
        --max-budget-usd 1.50
        --output-format json

▸ A real GitHub Actions step

- name: Nightly issue triage
  run: |
    claude -p \
      --permission-mode bypassPermissions \
      --max-turns 40 \
      --max-budget-usd 2.00 \
      --output-format json \
      "/goal every issue labeled 'needs-triage' is either
       (a) labeled with one of {bug, feature, question, won't-fix},
       (b) closed with a reason, or
       (c) commented on with a follow-up question to the reporter.
       Stop after 40 turns even if not done."
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

▸ Resume behavior

If a session ends with a goal still active, --resume or --continue restores the condition. The turn count, timer, and token-spend baseline reset on resume. An achieved goal is not restored.

▸ The three discipline rules

Always set --max-turns AND --max-budget-usd. Headless without caps is a runaway disguised as a workflow.
Treat the --output-format json envelope as a contract — fail the CI step if the JSON is malformed instead of letting downstream consume garbage.
Use bypassPermissions mode only in CI, never on a real workstation. CI sandboxes are ephemeral; your laptop's ~ isn't.

Deep dive → Master Class §12 CI/CD & Headless

Next → §8 · Limitations & anti-patterns

§8

Limitations & Anti-Patterns

production

TL;DR — five failure modes burn money or land bad code. Read these before you set your first long-running goal.

▸ #1 — The condition isn't observable

"Make the app production-ready" can never be proven in a transcript. The evaluator has no notion of "ready"; it can only check what Claude printed. Symptom: every turn the evaluator says "not yet" and the loop runs forever. Fix: rewrite as a list of countable end states ("0 lint errors, 0 type errors, all tests pass, >80% line coverage measured by vitest --coverage").

▸ #2 — The worker "fixes" by deleting

"All tests pass" can be satisfied by deleting failing tests. Symptom: goal achieves in 3 turns, suite is empty. Fix: add a constraint — "no test file is deleted; git diff --name-only test/ shows only modifications, not deletions".

▸ #3 — Compound goals overwhelm the worker

"Migrate the DB AND refactor the parser AND write the docs" is too many independent end states. The worker context-switches, the evaluator gets confused by mixed signals, the loop never converges. Fix: sequence them — three goals, one at a time, set the next one when the previous achieves.

▸ #4 — No turn cap on an exploratory goal

Goals on tasks where it's unclear how many turns are needed (debugging a flake, exploring an unfamiliar codebase) easily run 100+ turns. Fix: always include "or stop after N turns" for exploratory work. You can always set another goal afterward.

▸ #5 — Trusting the achieved entry

The evaluator can be wrong. "Yes the tests pass" might mean "the last npm test command Claude ran exited 0" — but Claude may have stubbed out the failing assertion. Fix: for high-stakes goals, add an external check (a separate CI run, a code review) after the goal achieves. /goal is a strong attention focuser, not a substitute for review.

▸ When NOT to use /goal at all

Quick edits. A two-line fix doesn't need a loop; just ask.
Genuinely exploratory work. If you don't know what "done" looks like, /goal isn't the tool. Try /loop with a poll interval, or just iterate.
Work that requires human judgment per turn. Code review, design decisions, anything with subjective acceptance. The evaluator can't see your taste.

Deep dive → Master Class §4 Permissions (pair /goal with caps + deny lists)