Claude Agent SDK Adapter

Wrap Claude Agent SDK invocations in Kitaru checkpoints, capture session context, and replay completed Claude calls honestly

The Claude Agent SDK gives you Claude Code as a library: Claude can read files, edit files, run commands, use MCP servers, call tools, follow permissions, and keep a session transcript. Kitaru does not replace that agent loop.

Kitaru adds an outer durable execution boundary around it:

one completed Claude Agent SDK invocation = one Kitaru checkpoint

That boundary is useful when a Claude call is one part of a larger workflow. Imagine this flow:

collect inputs → ask Claude to analyze them → write report → notify reviewer

If Claude finishes the analysis and the later write report checkpoint fails, Kitaru can replay the flow and reuse the completed Claude result instead of calling Claude again. You keep the Claude session ID, final text, usage/cost metadata, message records, and Kitaru artifacts that explain what happened.

The adapter focuses on the completed Claude SDK invocation as the durable unit: one prompt enters the SDK, Claude finishes, and Kitaru stores the completed result and capture envelope.

The mental model

Think of Claude Agent SDK as the driver and Kitaru as the trip recorder and checkpoint gate.

Claude still drives inside the invocation:

Claude prompt
  └─ Claude Agent SDK / Claude Code loop
      ├─ model calls
      ├─ built-in tools
      ├─ optional Bash commands
      ├─ optional MCP/custom tool calls
      ├─ permissions and hooks
      └─ final ResultMessage with a session_id

Kitaru wraps the outside:

@kitaru.flow
  └─ claude_summary_claude_invocation checkpoint
      └─ one call to claude_agent_sdk.query(...)

So Kitaru can say:

“This Claude invocation completed. Here is the result and session context. On replay, I can return that completed boundary output without calling Claude again.”

Kitaru cannot honestly say:

“I can resume from Claude's sixth internal tool call and deterministically avoid every side effect Claude already performed.”

The difference matters. If Claude runs Bash and creates report.md, Kitaru can store the final Claude result saying that happened. Kitaru does not automatically snapshot and recreate report.md in a fresh workspace. If that file must be a durable workflow output, write it in a Kitaru-owned checkpoint after Claude returns the content.

What you get

The adapter gives existing Claude Agent SDK users:

one durable Kitaru checkpoint around each completed Claude invocation
replay-skip for completed Claude invocations inside larger Kitaru flows
a typed ClaudeRunResult with final text, session ID, stop reason, turn count, usage, model usage, and cost fields when the SDK reports them
captured SDK message records
best-effort local transcript capture when the Claude SDK writes a transcript file where the adapter can find it
a redacted options manifest for debugging and audit trails
Kitaru event-log and run-summary artifacts
an explicit resume path through Claude's native session_id

Claude's own docs are still the source of truth for the inner SDK behavior:

Install

Add the Claude Agent SDK extra. Include local if you want the local Kitaru server and dashboard:

uv add "kitaru[claude-agent-sdk,local]"

Initialize the project once:

kitaru init
kitaru login        # local server; add a URL for a deployed one
kitaru status

For direct Anthropic API usage, set ANTHROPIC_API_KEY before making a real Claude call:

export ANTHROPIC_API_KEY='<your-anthropic-api-key>'

The Claude SDK also supports Bedrock and Vertex modes when their provider-specific environment is configured.

Minimal flow

This example keeps Claude's tools disabled so the first run is easy to reason about: one prompt goes in, one completed Claude result comes back, and Kitaru stores that completed invocation.

from pathlib import Path

from claude_agent_sdk import ClaudeAgentOptions
from kitaru import flow
from kitaru.adapters.claude_agent_sdk import (
    ClaudeRunRequest,
    ClaudeRunResult,
    KitaruClaudeRunner,
)

runner = KitaruClaudeRunner(
    name="claude_summary",
    options_factory=lambda request: ClaudeAgentOptions(
        # Empty allowed_tools keeps this first example non-destructive.
        allowed_tools=[],
        cwd=request.cwd,
        resume=request.resume_session_id,
        max_turns=request.max_turns,
    ),
)

@flow
def summarize(prompt: str) -> ClaudeRunResult:
    request = ClaudeRunRequest.start(
        prompt,
        cwd=str(Path.cwd()),
        max_turns=1,
    )
    return runner.run_sync(request)

result = summarize.run("Explain Kitaru checkpoints in five sentences.").wait()
print(result.final_text)
print(result.session_id)

The checkpoint name is derived from the runner name. In this example, the adapter-created checkpoint is named:

claude_summary_claude_invocation

How a run works, step by step

When runner.run_sync(ClaudeRunRequest.start(...)) executes inside a Kitaru flow, this is the concrete sequence:

1. Kitaru opens one synthetic checkpoint.
2. The adapter builds ClaudeAgentOptions from options_factory(request).
3. The adapter calls claude_agent_sdk.query(prompt=..., options=...).
4. The adapter drains the SDK messages until the final ResultMessage arrives.
5. It extracts final text, session_id, usage, cost, stop reason, and turn count.
6. It tries to locate the local Claude transcript file for that session.
7. It saves configured artifacts: messages, transcript, manifest, output, usage,
   event log, and run summary.
8. It returns ClaudeRunResult as the checkpoint output.

On replay, if this checkpoint is already complete and cache/replay rules allow Kitaru to reuse it, Kitaru serves the saved ClaudeRunResult. Claude is not called again for that completed invocation.

Requests, sessions, and resume

Use ClaudeRunRequest.start(...) for a fresh Claude SDK session:

request = ClaudeRunRequest.start(
    "Review this policy for missing controls.",
    cwd="/path/to/project",
    max_turns=3,
    metadata={"document": "it-policy"},
)

Use ClaudeRunRequest.resume(...) when you want the Claude SDK to continue from a previous Claude session_id:

request = ClaudeRunRequest.resume(
    "Now explain the highest-risk finding.",
    session_id=previous_result.session_id,
    cwd="/path/to/project",
    max_turns=2,
)

If the prompt comes from a previous checkpoint or from a flow-body kitaru.llm() call, remember that the value in the flow body is a Kitaru checkpoint output handle, not the concrete string yet. Load it before building a Claude request when Claude needs the actual text:

prompt_text = render_prompt(topic).load()
request = ClaudeRunRequest.start(prompt_text)

If you want to preserve Kitaru's durable data edge instead, pass the original handle into a downstream @checkpoint and build the ClaudeRunRequest inside that checkpoint.

This uses Claude's native session mechanism. Kitaru records the session ID and passes it back through ClaudeAgentOptions(resume=...) when you resume.

Session resume and Kitaru replay are related but different:

Thing	What it means
Claude `session_id`	Claude's conversation/session handle. Use it to continue a Claude SDK session.
Kitaru checkpoint	Kitaru's durable workflow boundary. Use it to skip or replay completed workflow work.
Claude transcript	The conversation record written by Claude SDK when available. Useful for audit/debugging.
Workspace files	Real files on disk. Kitaru does not snapshot them in this adapter.

A practical consequence: if a Claude invocation fails halfway through, Kitaru has no completed invocation result to reuse. Retrying that checkpoint starts the Claude invocation again. If you need stronger mid-invocation transcript persistence, look at Claude's session and session-store features in the official SDK docs; this adapter's Kitaru boundary is the completed invocation.

On Kubernetes (or any distributed orchestrator), treat local Claude transcript files and workspace files as pod-local best-effort state. A later checkpoint or replay may run in a different pod with a different filesystem. The durable part is the completed Kitaru checkpoint output/artifacts that Kitaru persists; local Claude JSONL transcript files are only reliably reusable across pods when you back them with shared persistent storage and/or a Claude session store.

Result shape

KitaruClaudeRunner.run_sync(...) returns a ClaudeRunResult with fields such as:

final_text
session_id
transcript_path
usage
cost_usd
model_usage
stop_reason
num_turns
artifact names for captured messages, transcript, output, usage, event log, and run summary
warnings for best-effort capture gaps, such as a missing transcript file or a non-fatal artifact/event/log persistence failure

Failed Claude invocations raise an exception instead of returning a ClaudeRunResult(status="failed"). If the failure happens inside a Kitaru checkpoint, the adapter still records best-effort failure metadata before the exception propagates.

Capture policy

By default, the adapter captures the boundary data that is useful for replay inspection and audits:

prompt and SDK message records
best-effort local transcript JSONL payload
redacted options manifest
final output
usage and cost information when the SDK reports it
one invocation event and one run summary

You can reduce what is stored with ClaudeCapturePolicy:

from kitaru.adapters.claude_agent_sdk import ClaudeCapturePolicy, KitaruClaudeRunner

runner = KitaruClaudeRunner(
    name="private_claude_run",
    capture=ClaudeCapturePolicy(
        save_prompt=False,
        save_messages=False,
        save_transcript_file=False,
        save_usage=True,
    ),
)

Treat messages and transcripts as conversation data. They may contain prompts, retrieved document snippets, tool arguments, command output, and model output. The options manifest is redacted by default, including common secret-bearing mapping keys, key/value sequence pairs such as [("Authorization", "Bearer ...")], [("x-api-key", "...")], cookie/env pairs, and env-list entries shaped like {"name": "ANTHROPIC_API_KEY", "value": "..."}. Message/transcript artifacts are not a secret store; turn them off when the conversation itself is sensitive.

Artifact, event-log, run-summary, and metadata persistence are best-effort by default. That matters for replay safety: if Claude has already completed and a later Kitaru save/log call fails, the adapter does not fail the completed Claude call just because observability capture had a problem. Instead, the returned ClaudeRunResult.warnings and ClaudeRunResult.metadata describe what failed, and failed artifact-name fields are left empty.

If you want observability persistence to be fail-fast, opt in explicitly:

runner = KitaruClaudeRunner(
    name="strict_claude_run",
    capture=ClaudeCapturePolicy(
        fail_on_artifact_capture_error=True,
        fail_on_event_persistence_error=True,
    ),
)

Even in strict mode, a failed attempt to save failure metadata does not hide the original Claude SDK error.

Options factory

Prefer options_factory over passing one static options object when request fields should affect the Claude SDK call.

runner = KitaruClaudeRunner(
    name="claude_project_agent",
    options_factory=lambda request: ClaudeAgentOptions(
        cwd=request.cwd,
        resume=request.resume_session_id,
        max_turns=request.max_turns,
        allowed_tools=[],
        setting_sources=["project"],
    ),
)

Why this matters:

cwd controls which project directory Claude sees.
resume_session_id must become ClaudeAgentOptions(resume=...) for resumed requests.
max_turns is often different per call.
MCP servers, hooks, permission callbacks, and session stores can be live Python objects. Building fresh options per request avoids mutating shared state.

If you pass a static options= object and then send a request with cwd, resume_session_id, or max_turns, the adapter raises instead of guessing how to mutate your options.

Checkpoint strategy

The Claude Agent SDK adapter currently supports one strategy:

KitaruClaudeRunner(name="claude_reviewer", checkpoint_strategy="invocation")

"invocation" is also the default. Strategies such as "calls", "runner_call", "model_call", and "tool_call" are rejected on purpose because they would imply granular durability that this adapter does not provide.

checkpoint_config= accepts the same small checkpoint knobs used by other adapter-created checkpoints:

runner = KitaruClaudeRunner(
    name="claude_reviewer",
    checkpoint_config={"cache": True, "retries": 1},
)

runtime="isolated" is not supported for adapter-managed Claude checkpoints. Claude SDK options can contain live process objects such as MCP servers, hooks, and callbacks, and those are not reconstructible across Kitaru's isolated runtime boundary yet.

Calling from inside an existing checkpoint

By default, runner.run(...) and runner.run_sync(...) reject calls made from inside an existing Kitaru checkpoint. The reason is concrete: Kitaru cannot open the adapter-created Claude invocation checkpoint inside another checkpoint. If the adapter silently called Claude directly there, replaying the outer checkpoint could call Claude again and duplicate tool actions, file edits, or API cost.

The recommended pattern is:

@flow
def review_flow(prompt: str) -> ClaudeRunResult:
    return runner.run_sync(ClaudeRunRequest.start(prompt, max_turns=1))

Use the direct-execution opt-in only when you have accepted that replay risk:

runner = KitaruClaudeRunner(
    name="claude_inside_existing_checkpoint",
    allow_direct_execution_inside_checkpoint=True,
)

When this opt-in is used, the returned result includes a warning and metadata["direct_execution_inside_checkpoint"] = True.

Claude tools, MCP, Bash, hooks, and permissions

You can still use Claude Agent SDK features through ClaudeAgentOptions:

runner = KitaruClaudeRunner(
    name="claude_code_review",
    options_factory=lambda request: ClaudeAgentOptions(
        cwd=request.cwd,
        allowed_tools=["Read", "Grep", "Glob"],
        disallowed_tools=["Bash"],
        setting_sources=["project"],
    ),
)

Kitaru passes those options to Claude. Claude's own runtime decides what tools are available, what permissions apply, and which hooks run. See the official hooks and permissions docs for that layer.

The important Kitaru point is: these features remain inside the one Claude invocation checkpoint.

A concrete failure story:

1. Claude runs Bash and writes report.md.
2. Claude returns "I wrote report.md".
3. Kitaru stores the invocation result as a checkpoint.
4. Later, the flow crashes in a different checkpoint.
5. You replay from after the Claude checkpoint.

On replay, Kitaru can return the saved ClaudeRunResult without calling Claude again. But if the replay is running in a fresh workspace, Kitaru does not magically recreate report.md, because that file write happened inside Claude's own Bash/tool loop, not in a Kitaru-owned checkpoint. The adapter also does not snapshot the working directory before or after Claude runs.

If a side effect must be durable, make it a Kitaru-owned step:

from pathlib import Path

import kitaru

@kitaru.checkpoint
def write_report(text: str, path: str) -> str:
    Path(path).write_text(text)
    return path

@kitaru.flow
def report_flow(prompt: str) -> str:
    claude_result = runner.run_sync(ClaudeRunRequest.start(prompt, max_turns=1))
    return write_report(claude_result.final_text or "", "report.md")

That way the durable file write is visible to Kitaru as its own checkpoint.

Claude file checkpointing is different

Claude Agent SDK has its own file checkpointing feature for rewinding certain file changes made by Claude's built-in file-edit tools. That is useful, but it is not the same thing as a Kitaru checkpoint.

Keep the two ideas separate:

Feature	Owned by	What it helps with
Kitaru checkpoint	Kitaru	Replay/skip completed workflow work and store typed outputs.
Claude session	Claude SDK	Continue a Claude conversation from a session ID.
Claude file checkpointing	Claude SDK	Rewind supported Claude file-edit tool changes in a session.
Workspace snapshot	Not provided by this adapter	Recreate arbitrary files/process state after a crash.

The adapter records what it can observe at the invocation boundary. It does not turn Claude file checkpointing into Kitaru checkpointing, and it does not provide a general workspace snapshot system.

Recommended durability pattern

Put the Claude runner call directly in the flow body so the adapter can create its own checkpoint around the invocation. Put side effects that must be durable in separate Kitaru checkpoints after Claude returns.

That gives you a concrete sequence like this:

flow body asks Claude
  -> adapter-created Claude invocation checkpoint completes
  -> Kitaru-owned write_report checkpoint writes a durable file
  -> Kitaru-owned notify_reviewer checkpoint sends a notification

If a later checkpoint fails, Kitaru can reuse the completed Claude invocation result instead of calling Claude again.

Runnable example

Run the educational integration example:

uv sync --extra local --extra claude-agent-sdk
uv run kitaru init
export ANTHROPIC_API_KEY='<your-anthropic-api-key>'
uv run python examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_adapter.py

To inspect the example without making a Claude API call:

uv run python examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_adapter.py --help

The example prints the final text, session ID, usage/cost details when reported, and Kitaru artifact names. It uses allowed_tools=[] and max_turns=1 so the first run teaches the adapter boundary without introducing tool side effects.

Larger Claude example

For a richer workflow, see the compliance-review example:

uv run examples/end_to_end/compliance_review/stage_1_single_turn.py

That example shows staged audit checkpoints, artifact persistence, and wait/resume around Claude turns. It uses the same high-level boundary: one Claude SDK invocation is one Kitaru checkpoint. It also has a custom transcript materializer for its multi-turn remote-stack story. That extra materializer is example-specific; the generic adapter still does not claim granular replay of Claude-internal side effects.

Troubleshooting

“Why do I need `@kitaru.flow`?”

The adapter creates a Kitaru checkpoint. Checkpoints need a flow execution to belong to. Wrap Claude calls in @kitaru.flow so Kitaru can record and replay that boundary.

“Why did `checkpoint_strategy="calls"` fail?”

Because Claude's Python SDK does not expose replay-safe Python call bodies for each internal model/tool/Bash/MCP action. The adapter would be lying if it called hook observations “call checkpoints.” Use checkpoint_strategy="invocation" and move durable side effects into explicit Kitaru checkpoints.

“Why did calling the runner from my checkpoint fail?”

The adapter refused the call because it could not create its own Claude invocation checkpoint inside your existing checkpoint. Move the runner call to the flow body when possible. If you deliberately want Claude to run directly inside the existing checkpoint, create the runner with allow_direct_execution_inside_checkpoint=True and treat replay of the outer checkpoint as capable of calling Claude again.

“Why is my transcript artifact missing?”

The adapter always captures SDK-emitted messages when save_messages=True. Transcript-file capture is best-effort because Claude owns where and when local JSONL transcripts are written. Missing transcript files create a warning, not a failed run.

“Why is an artifact name missing even though Claude completed?”

Capture and event persistence are best-effort by default. If Claude completed but saving one artifact or log entry failed, the adapter returns the Claude result and records the persistence problem in result.warnings and result.metadata instead of turning the completed Claude call into a failure. Use fail_on_artifact_capture_error=True or fail_on_event_persistence_error=True when you want strict behavior.

“Can I use live streaming output?”

Not in this adapter version. Kitaru/ZenML streaming support is planned separately. The current adapter waits for the completed Claude result and stores that completed boundary output.

“Can this resume a half-finished Claude invocation?”

Not by itself. If the invocation does not complete, Kitaru has no completed checkpoint output to reuse. Claude session/session-store features may help with conversation recovery at the Claude layer, but the Kitaru adapter boundary is still one completed invocation.

Replay and overrides

Re-run a flow with cached outputs for completed checkpoints

Artifacts

Save named outputs from Kitaru-owned checkpoints

Claude Agent SDK Adapter

Replay and overrides

Artifacts

On this page