Learn From Claude Code: Query Engine

Learning Claude Code's query engine by inspecting its leaked source code.

#agent

#ai

Created on Apr 02, 2026, Last Updated on May 11, 2026, By a Developer

Claude Code’s source code leaked. Setting aside the surveillance concerns and inevitable spaghetti of any real codebase, it’s a genuinely well-designed harness.

I’ve been digging through it, picking out patterns worth understanding. This is one of them: the Query Engine - and why “state machine” misses the point.

Other posts in this series: App State, Tool System, Permission System, Memory, Context Compaction, MCP Integration, Multi-Agent, Agent Spawning.

The Problem

Every agent starts with a simple loop:

while (true) {
  const response = await api.sendMessage(messages)
  messages.push(response)

  if (response.stop_reason === 'end_turn') break

  const toolResults = await executeTools(response.tool_calls)
  messages.push(toolResults)
}

A hundred lines. The amazing agent loop with some retry logic, done.

Then production shows up. Streaming (don’t buffer). Thinking blocks (preserve for trajectory). Context limits (compact or fail). Output limits (escalate or nudge). Rate limits (backoff and retry). Budget tracking (don’t exceed).

The simple loop becomes a recovery coordinator - not because someone wanted a state machine, but because every production failure mode needs a specific recovery strategy. The App State stores the recovery counters, and context compaction handles the actual context reduction.

State Machine for Failure Recovery

The state fields for query engine tell a story of all recovery attempts.

type State = {
  messages: Message[]

  // Compaction tracking
  autoCompactTracking: AutoCompactTrackingState | undefined

  // Output token recovery
  maxOutputTokensRecoveryCount: number    // how many times we've retried
  maxOutputTokensOverride: number | undefined  // escalated limit (8k -> 64k)

  // Reactive compact guard
  hasAttemptedReactiveCompact: boolean    // only one attempt per query

  // Turn tracking
  turnCount: number
  transition: Continue | undefined        // why previous iteration continued
}

Each field exists because something can fail. maxOutputTokensRecoveryCount exists because output limits exist. hasAttemptedReactiveCompact exists because 413 errors exist.

Two Actions Yielded from the Loop

The query engine has a simple vocabulary.

continue meaning the loop continues, there may or may not have been an error.
return when you’re done (or truly stuck).

Both associate with a reason to avoid an ambiguous retry flag confusing consumers.

Continue

continue means “we are still in the loop, even if something broke, but I know how to handle it.” The model sees the new context (compact summary, resume prompt, blocking error) but never sees the transition reason itself.

// Normal flow - execute tools and continue
transition: { reason: 'next_turn' }

// 413 prompt too long - compact context and retry
transition: { reason: 'reactive_compact_retry' }

// Hit 8k output cap - escalate to 64k and retry same request
transition: { reason: 'max_output_tokens_escalate' }

// Hit output limit - inject resume prompt and continue
transition: { reason: 'max_output_tokens_recovery', attempt: 2 }

// Hook rejected something - inject error, force model to fix
transition: { reason: 'stop_hook_blocking' }

// Budget exceeded - nudge model to wrap up
transition: { reason: 'token_budget_continuation' }

Return

Not all errors are recoverable, such as “Prompt too long” or “Max turn reached”. Terminal states. No more recovery attempts. The loop exits.

// Normal completion
return { reason: 'completed' }

// Hit configured turn limit
return { reason: 'max_turns', turnCount: 15 }

// Context exceeds limit, compaction failed
return { reason: 'prompt_too_long' }

// User cancelled during stream
return { reason: 'aborted_streaming' }

Context Surgery

The query engine doesn’t just pass messages through. It manipulates context to achieve reliable behavior and satisfy API constraints.

Meta Message Injection

Messages with isMeta: true are hidden from UI but visible to the model. The harness uses them to steer behavior without cluttering the transcript.

// Output limit hit - steer recovery
const recoveryMessage = createUserMessage({
  content: `Output token limit hit. Resume directly — no apology, no recap.
    Pick up mid-thought if that is where the cut happened.`,
  isMeta: true,
})

// Budget exceeded - prevent premature summary
const nudgeMessage = createUserMessage({
  content: `Stopped at 78% of token target. Keep working — do not summarize.`,
  isMeta: true,
})

It feels like Claude Code is whispering at the model, giving it guidance to keep it in lane without the user noticing that recovery is happening under the hood.

Error Withholding

Errors are surfaced far less often than they actually occur. Requiring users to retry every time minor errors happen is a bad experience. Claude Code handles this by withholding the error until recovery exhausts.

if (isWithheldMaxOutputTokens(message)) {
  withheld = true  // Don't yield yet
}
// ... try recovery ...
if (recoveryExhausted) {
  yield withheldMessage  // Only now surface it
}

The pattern: inject prompts to steer, withhold errors until recovery fails, never let the model see the raw failure.

Clean before Retry

Sometimes messages in context become invalid and must be removed before retry. Two common scenarios:

Streaming fallback: Switching models mid-stream leaves partial thinking blocks with signatures bound to the original model.
Credential changes: After /login, thinking block signatures become invalid.

Tombstones signal downstream consumers (UI, transcript) to remove orphaned messages.

if (streamingFallbackOccured) {
  for (const msg of assistantMessages) {
    yield { type: 'tombstone', message: msg }
  }
  assistantMessages.length = 0
}

Strip Signatures - When credentials change (/login) or model switches, thinking block signatures become invalid. The API returns 400 on replay. Strip them before retry:

messagesForQuery = stripSignatureBlocks(messagesForQuery)

Brief

The tool system provides the tools the loop calls. The simple agent loop is the happy path only. Real harnesses don’t emerge from a simple loop. The agent goes south instantly when errors are not handled properly.

By hiding unnecessary detail from the model and whispering guidance through meta messages, the agent is able to keep running in its lane for longer. Every agent architecture starts with this loop — Claude Code just adds more recovery handles. Not saying Claude Code is perfect, but it does tell a good story of providing resilience through controlled opacity.

How Hermes-Agent Evolves Over Time

by a Developer

May 2026

by a Developer

agent

Hermes-Agent claims to be "the agent that grows with you". It's not marketing — it's three counters, a background fork, and a 100-line prompt.

Learn From Claude Code: Agent Spawning

by a Developer

Apr 2026

by a Developer

agent

Learning Claude Code's agent spawning by inspecting its leaked source code.

Learn From Claude Code: App State Machine

by a Developer

Apr 2026

by a Developer

agent

Learning Claude Code's App State by inspecting its leaked source code.

Learn From Claude Code: MCP Integration

by a Developer

Apr 2026

by a Developer

agent

Anthropic invented MCP a year ago. Now they're experimenting with skill-ifying it. Here's what the leaked source reveals about the pivot.

ZANE.C

Learn From Claude Code: Query Engine

Learn From Claude Code: Query Engine

Learning Claude Code's query engine by inspecting its leaked source code.

The Problem

State Machine for Failure Recovery

Two Actions Yielded from the Loop

Continue

Return

Context Surgery

Meta Message Injection

Error Withholding

Clean before Retry

Brief

On this page

Related

How Hermes-Agent Evolves Over Time

Learn From Claude Code: Agent Spawning

Learn From Claude Code: App State Machine

Learn From Claude Code: MCP Integration