Claude Code’s source code leaked. Setting aside the surveillance concerns and inevitable spaghetti of any real codebase, it’s a genuinely well-designed harness.
I’ve been digging through it, picking out patterns worth understanding. This is one of them: the Query Engine - and why “state machine” misses the point.
The Problem
Every agent starts with a simple loop:
while (true) {
const response = await api.sendMessage(messages)
messages.push(response)
if (response.stop_reason === 'end_turn') break
const toolResults = await executeTools(response.tool_calls)
messages.push(toolResults)
}
A hundred lines. The amazing agent loop with some retry logic, done.
Then production shows up. Streaming (don’t buffer). Thinking blocks (preserve for trajectory). Context limits (compact or fail). Output limits (escalate or nudge). Rate limits (backoff and retry). Budget tracking (don’t exceed).
The simple loop becomes a recovery coordinator - not because someone wanted a state machine, but because every production failure mode needs a specific recovery strategy.
State Machine for Failure Recovery
The state fields for query engine tell a story of all recovery attempts.
type State = {
messages: Message[]
// Compaction tracking
autoCompactTracking: AutoCompactTrackingState | undefined
// Output token recovery
maxOutputTokensRecoveryCount: number // how many times we've retried
maxOutputTokensOverride: number | undefined // escalated limit (8k -> 64k)
// Reactive compact guard
hasAttemptedReactiveCompact: boolean // only one attempt per query
// Turn tracking
turnCount: number
transition: Continue | undefined // why previous iteration continued
}
Each field exists because something can fail. maxOutputTokensRecoveryCount exists because output limits exist. hasAttemptedReactiveCompact exists because 413 errors exist.
Two Actions Yielded from the Loop
The query engine has a simple vocabulary.
continuemeaning the loop continues, there may or may not have been an error.returnwhen you’re done (or truly stuck).
Both associate with a reason to avoid an ambiguous retry flag confusing consumers.
Continue
continue means “we are still in the loop, even if something broke, but I know how to handle it.” The model sees the new context (compact summary, resume prompt, blocking error) but never sees the transition reason itself.
// Normal flow - execute tools and continue
transition: { reason: 'next_turn' }
// 413 prompt too long - compact context and retry
transition: { reason: 'reactive_compact_retry' }
// Hit 8k output cap - escalate to 64k and retry same request
transition: { reason: 'max_output_tokens_escalate' }
// Hit output limit - inject resume prompt and continue
transition: { reason: 'max_output_tokens_recovery', attempt: 2 }
// Hook rejected something - inject error, force model to fix
transition: { reason: 'stop_hook_blocking' }
// Budget exceeded - nudge model to wrap up
transition: { reason: 'token_budget_continuation' }
Return
Not all errors are recoverable, such as “Prompt too long” or “Max turn reached”. Terminal states. No more recovery attempts. The loop exits.
// Normal completion
return { reason: 'completed' }
// Hit configured turn limit
return { reason: 'max_turns', turnCount: 15 }
// Context exceeds limit, compaction failed
return { reason: 'prompt_too_long' }
// User cancelled during stream
return { reason: 'aborted_streaming' }
Context Surgery
The query engine doesn’t just pass messages through. It manipulates context to achieve reliable behavior and satisfy API constraints.
Meta Message Injection
Messages with isMeta: true are hidden from UI but visible to the model. The harness uses them to steer behavior without cluttering the transcript.
// Output limit hit - steer recovery
const recoveryMessage = createUserMessage({
content: `Output token limit hit. Resume directly — no apology, no recap.
Pick up mid-thought if that is where the cut happened.`,
isMeta: true,
})
// Budget exceeded - prevent premature summary
const nudgeMessage = createUserMessage({
content: `Stopped at 78% of token target. Keep working — do not summarize.`,
isMeta: true,
})
It feels like Claude Code is whispering at the model, giving it guidance to keep it in lane without the user noticing that recovery is happening under the hood.
Error Withholding
Errors are surfaced far less often than they actually occur. Requiring users to retry every time minor errors happen is a bad experience. Claude Code handles this by withholding the error until recovery exhausts.
if (isWithheldMaxOutputTokens(message)) {
withheld = true // Don't yield yet
}
// ... try recovery ...
if (recoveryExhausted) {
yield withheldMessage // Only now surface it
}
The pattern: inject prompts to steer, withhold errors until recovery fails, never let the model see the raw failure.
Clean before Retry
Sometimes messages in context become invalid and must be removed before retry. Two common scenarios:
- Streaming fallback: Switching models mid-stream leaves partial thinking blocks with signatures bound to the original model.
- Credential changes: After
/login, thinking block signatures become invalid.
Tombstones signal downstream consumers (UI, transcript) to remove orphaned messages.
if (streamingFallbackOccured) {
for (const msg of assistantMessages) {
yield { type: 'tombstone', message: msg }
}
assistantMessages.length = 0
}
Strip Signatures - When credentials change (/login) or model switches, thinking block signatures become invalid. The API returns 400 on replay. Strip them before retry:
messagesForQuery = stripSignatureBlocks(messagesForQuery)
Brief
The simple agent loop is the happy path only. Real harnesses don’t emerge from a simple loop. The agent goes south instantly when errors are not handled properly.
By hiding unnecessary detail from the model and whispering guidance through meta messages, the agent is able to keep running in its lane for longer. Not saying Claude Code is perfect, but it does tell a good story of providing resilience through controlled opacity.
Related
Learn From Claude Code: MCP Integration
by a Developer
Apr 2026
by a Developer
agent
ai
Anthropic invented MCP a year ago. Now they're experimenting with skill-ifying it. Here's what the leaked source reveals about the pivot.
Learn From Claude Code: Memory
by a Developer
Apr 2026
by a Developer
agent
ai
Learning Claude Code's memory system by inspecting its leaked source code.
Learn From Claude Code: Permission System
by a Developer
Apr 2026
by a Developer
agent
ai
Learning Claude Code's multi-layer permission architecture by inspecting its leaked source code.
Learn From Claude Code: Tool System
by a Developer
Apr 2026
by a Developer
agent
ai
Claude Code has 45+ tools. OpenClaw preached minimal tools. Both work. What does that tell us?