Kitaru

How It Works

What runs where when you execute a Kitaru flow — server, runner, execution targets, and the contract between them.

When you call .run() on a flow, three things work together to make it durable: the Kitaru server (shared metadata, auth, deployment registry), the runner (per-run durable control flow), and one or more execution targets (where each checkpoint's code actually executes). During local development all three collapse into a single Python process. In production they separate across your infrastructure.

Clientyour laptop, CI, or service
  • SDK
  • CLI
  • UI
  • MCP
submit · input · replay
Servercentral coordination
  • execution metadata
  • checkpoint state
  • log metadata
  • auth + credentials
state · logs · results
Runneryour Python process or pod
  • runs checkpoints
  • calls kitaru.llm()
  • writes artifacts
  • can wait + resume
Cloud storageyour bucket
  • S3
  • GCS
  • Azure Blob
  • local fs (dev)
Locally: all three collapse into one Python process. In production: separate tiers across your stack.

Kitaru separates durable control flow from code execution:

  • The Kitaru server stores shared metadata, deployment snapshots, checkpoint state, execution logs, and control-plane data.
  • For each run, a runner (the durable brain of an execution) executes the selected flow snapshot, manages checkpoint order, persists state, and handles retry, replay, resume, and wait.
  • Individual checkpoints can run inline in the runner or in an isolated runtime (a separate container, Kubernetes job, or cloud job on the configured stack). The runner/target split is also where sandboxes, external tools, and custom compute backends conceptually plug in — the two shipped execution targets today are inline and isolated.

Key idea. The runner owns the durable run: checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work. Checkpoints are the contract between the two.

Consumeruser · service · upstream agent
invoke
Kitaru invocation APICLI · SDK · MCP · HTTP
Control planelong-lived, shared
Auth & session
Flow / deployment registry
Execution metadata
Checkpoint state
Log metadata
Credential brokering
schedules run on your stack
Orchestration planeper-run, durable
Runnerthe durable brain of one execution
  • loads the selected flow snapshot
  • controls checkpoint order
  • persists state after every checkpoint
  • retry · replay · resume · wait
  • can wait for days without burning compute
delegates each checkpoint
Execution planewhere your code actually runs
Inlinesame process as runner
Isolated jobseparate container / pod
Sandboxrestricted egress / capabilities
External / MCP toolremote capability or API
persists outputs
Persistenceyour cloud
Artifact / state storeyour S3 / GCS / Azure Blob
  • checkpoint outputs
  • files · errors · logs
  • replay lineage
Metadata storeruns · versions · statuses
read by
Operations
Kitaru UIbrowse · replay
CLIkitaru executions
Python SDKKitaruClient
MCP toolsfor AI assistants
Control planeOrchestration planeExecution planePersistenceConceptual — via adapters or your platform
Kitaru separates durable control flow (orchestration plane) from code execution (execution plane). Checkpoints are the contract between them.

Control / orchestration / execution

Kitaru splits runtime responsibilities into three planes. (This is separate from the harness / runtime / platform split, which is about where Kitaru sits in the broader agent stack — not about how a single run executes.)

Control planelong-lived · shared · the Kitaru server
Auth
Deployment registry
Execution metadata
Checkpoint state
Log metadata
Orchestration planeper-run · durable · the runner
Checkpoint order
Replay
Resume
Wait / suspend
Retry policy
State durability
Execution planewhere code runs · what @checkpoint targets
Inline
Isolated job
Sandbox
External / MCP tool
Custom backend
Shipped execution targetConceptual — same contract, via adapters or your platform
The three planes run independently. The control plane survives if a runner dies. A runner survives if an execution target dies.
PlaneWhat lives hereResponsibility
Control planeKitaru server, UI, metadata DB, deployment registry, CLI/SDK/MCP APIs, auth and credential brokeringKnows what exists and who can call what
Orchestration planeThe runner for a single executionOwns durable control flow for one run
Execution planeInline process or isolated container (shipped today); sandboxes, external tools, and custom backends are conceptual extensions of the same contractPerforms work

The control plane is long-lived and shared. The orchestration plane is per-run and durable. The execution plane is where your code (and your agent's code) actually executes.

What runs where?

The most common confusion is which component runs user code. This table makes it explicit.

ComponentWhat it doesRuns user code?
Kitaru server (control plane)Stores deployment registry, execution metadata, checkpoint state, log metadata, auth and session stateNo
Runner (orchestration plane)Runs the selected flow snapshot, controls checkpoint order, persists durable state, handles retry / replay / resume / waitYes, for inline checkpoints
Inline executionRuns a checkpoint inside the runner process/podYes
Isolated runtimeRuns a checkpoint in a separate container, job, pod, or remote compute backendYes
Sandbox (conceptual)The same contract as isolated, tightened with stronger isolation or restricted egress. Not a shipped Kitaru execution target today — provided via adapters / your platform.Yes, where integrated
External tool / MCP serverPerforms work through a remote API or capabilityOutside Kitaru
Metadata storeStores runs, versions, checkpoint statuses, replay lineageNo
Artifact / state storeStores checkpoint outputs, files, logs, replay lineageNo

The run, step by step

Here is what actually happens when a consumer invokes a flow.

Request arrives. A user, service, or upstream agent calls the Kitaru invocation API (via CLI, SDK, MCP, or HTTP).

Server resolves the flow. The server authenticates the caller, resolves the target flow (and optionally a version or tag), validates the input schema, and creates a run record plus a FlowHandle.

Runner starts. The control plane schedules a runner on your configured stack — a Kubernetes pod, a cloud job, or the local process in dev. The runner loads the selected flow snapshot.

Runner executes checkpoints in order. For each checkpoint, the runner either executes inline or delegates to an isolated target. It waits for the result, persists the output to the artifact/state store, and advances.

State is durable the entire time. If a checkpoint fails, if the runner dies, or if a kitaru.wait() suspends the run, the server retains everything needed to retry, replay, or resume later.

Consumer observes results. The caller uses the returned FlowHandle (or the UI / CLI / SDK / MCP) to tail logs, inspect checkpoints, provide human input, replay, or cancel.

Runner vs sandbox

This is the idea that tends to click last and matter most.

The runner is the durable brain of a run.
The sandbox (or isolated runtime) is the hands that perform work.

If a sandbox dies mid-execution — a container evicted, a network partition, a pod OOM — the runner still holds durable checkpoint state and can retry that single checkpoint, resume from the last known boundary, or replay the run with a modified input or code version. The sandbox's failure is localized to the checkpoint that was executing, not the whole agent.

This is why platform teams should not confuse "I have a sandbox provider" with "I have durable execution". A sandbox is a bounded execution environment. Durable execution is a property of the surrounding runner — and of the checkpoints it persists.

Inline vs isolated checkpoints

Every checkpoint picks an execution target. Two are built in today: inline (same process as the runner) and isolated (a separate container or job on the configured stack). Code examples, decision rules for when to reach for isolated, and the interaction with .submit() live on the Checkpoints page.

A failed checkpoint is durable context

In classical pipelines, a failed step is a crash. In Kitaru, a failed checkpoint is durable context — something the runner, the agent loop, a human, or a retry policy can reason about.

Consider a document-synthesis agent:

query_expansionsuccess · artifact: expanded query set
retrievalsoft failure · artifact: 'document missing / entitlement denied'
synthesisnot yet run
failure becomes durable context
Recovery paths
Retrysame input, same code
Replay with new inpute.g. corrected document id
Replay with new codee.g. new retrieval strategy
Feed error into the agent looplet the agent self-correct
Wait for human correctionkitaru.wait(), then resume
In classical pipelines a failed step is a crash. In Kitaru it’s a typed artifact that every recovery path can read.

Because the retrieval checkpoint's failure is persisted as a typed artifact, a downstream consumer has several real options:

  • Retry the same checkpoint with the same input
  • Replay with a modified input (e.g. a corrected document id)
  • Replay with modified code (e.g. a new retrieval strategy)
  • Feed the error artifact back into the agent loop so it can self-correct
  • Wait for a human to provide a correction via kitaru.wait(), then resume

This is what "agent-native error handling" means in practice: failures become data, and durable state survives them.

How deep do you integrate?

You don't have to restructure your agent to get value. Pick the depth that fits.

Level 0 — Black-box harness

Wrap the entire agent run as one checkpoint.

flow
└── checkpoint: run_agent()
    └── PydanticAI / LangGraph / Claude Agent SDK / custom loop
  • Fastest integration
  • Minimal code changes
  • Framework-agnostic

The tradeoff: replay boundary is coarse (one per agent run) and you see less of the agent's internal state.

Level 1 — Coarse workflow checkpoints

Add checkpoints around the phases that matter to your team.

flow
├── checkpoint: plan()
├── checkpoint: retrieve()
├── checkpoint: act()
├── checkpoint: synthesize()
└── checkpoint: validate()
  • Useful replay points
  • Better audit trail
  • Good balance of portability and durability

The tradeoff: you (not the framework) decide where the boundaries go.

Level 2 — Framework-aware adapter

Use a Kitaru adapter that tracks the framework's internals (model calls, tool calls, intermediate state) as child events under the enclosing checkpoint.

flow
└── checkpointed framework runtime
    ├── model calls
    ├── tool calls
    ├── intermediate state
    └── final output
  • Richer introspection
  • Better debugging
  • Tighter developer experience

The tradeoff: adapters are per-framework and need maintenance. The Pydantic AI adapter is the first one shipped.

Framework-agnostic by construction

Kitaru does not require your agent to be written as a graph. @checkpoint wraps ordinary Python function boundaries, independent of the harness.

That means a platform team supporting multiple harnesses — PydanticAI here, LangGraph there, Claude Agent SDK for one team, a raw-Python loop for another — can still standardize durability, replay, and execution metadata on a single runtime primitive. The harness choice stays a per-team decision.

Fits behind your platform

Kitaru can be used directly through its invocation API, or placed behind your existing platform/gateway:

Consumerinternal user · product · upstream agent
Your gateway / product APIwhat your org owns
  • auth · entitlements
  • rate limits · policy
  • interceptors · guardrails
  • product-specific endpoints
Kitaru invocation APIthe runtime primitive
  • version / tag resolution
  • schema validation
  • run record + FlowHandle
  • credential brokering
Runner on your stackdurable execution
  • checkpoint order
  • replay · resume · wait
  • artifacts + state
  • retry + isolation
Artifacts + stateyour S3 / GCS / Azure Blob bucket
Kitaru drops in underneath your existing platform. Your auth, UI, and governance stay yours.

You keep your auth, entitlements, interceptors, observability, and UI. Kitaru handles the durable execution layer underneath. This is how Kitaru drops into a finance- or regulated-industry-style internal agent platform without asking you to rebuild the surrounding system.

Local development

When you are developing locally, all three components run inside a single Python process on your machine. The server is embedded — no separate service to start, no database to configure. Checkpoint outputs are written to your local filesystem.

This means you can install Kitaru, run kitaru init, and have a fully working durable execution environment in under a minute. Your flows behave exactly the same as they will in production — same checkpointing, same replay, same observability — just without the cloud infrastructure underneath.

Optionally, you can run a local server and UI to browse executions in a web UI.

Production

In production, the three components separate across your infrastructure:

  • The server runs as a long-lived Kubernetes pod (deployed via Helm). It stores execution state in a database and serves the UI. Your whole team connects to it.
  • The runner runs on the compute backend defined by your stack — Kubernetes, Vertex AI, SageMaker, AzureML. When you call .run(), the client fetches short-lived credentials from the server and dispatches the execution directly to the compute backend. The runner executes your checkpoints and writes outputs to cloud storage. If the execution crashes, replay picks up from the last completed checkpoint.
  • Artifacts and state live in your own S3 / GCS / Azure Blob bucket. The server tracks metadata but does not access storage directly; when a client needs to read files, it fetches temporary credentials brokered by the server.

There is no mandatory SaaS control plane in the path of your agent's data.

On this page