How It Works

What runs where when you execute a Kitaru flow — server, runner, execution targets, and the contract between them.

When you call .run() on a flow, three things work together to make it durable: the Kitaru server (shared metadata, auth, deployment registry), the runner (per-run durable control flow), and one or more execution targets (where each checkpoint's code actually executes). During local development all three collapse into a single Python process. In production they separate across your infrastructure.

Clientyour laptop, CI, or service

submit · input · replay

Servercentral coordination

execution metadata
checkpoint state
log metadata
auth + credentials

state · logs · results

Runneryour Python process or pod

runs checkpoints
calls kitaru.llm()
writes artifacts
can wait + resume

Cloud storageyour bucket

S3
GCS
Azure Blob
local fs (dev)

Locally: all three collapse into one Python process. In production: separate tiers across your stack.

Kitaru separates durable control flow from code execution:

The Kitaru server stores shared metadata, deployment snapshots, checkpoint state, execution logs, and control-plane data.
For each run, a runner (the durable brain of an execution) executes the selected flow snapshot, manages checkpoint order, persists state, and handles retry, replay, resume, and wait.
Individual checkpoints can run inline in the runner or in an isolated runtime (a separate container, Kubernetes job, or cloud job on the configured stack). The runner/target split is also where sandboxes, external tools, and custom compute backends conceptually plug in — the two shipped execution targets today are inline and isolated.

Key idea. The runner owns the durable run: checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work. Checkpoints are the contract between the two.

Consumeruser · service · upstream agent

invoke

Kitaru invocation APICLI · SDK · MCP · HTTP

Control planelong-lived, shared

Auth & session

Flow / deployment registry

Execution metadata

Checkpoint state

Log metadata

Credential brokering

schedules run on your stack

Orchestration planeper-run, durable

Runnerthe durable brain of one execution

loads the selected flow snapshot
controls checkpoint order
persists state after every checkpoint
retry · replay · resume · wait
can wait for days without burning compute

delegates each checkpoint

Execution planewhere your code actually runs

Inlinesame process as runner

Isolated jobseparate container / pod

Sandboxrestricted egress / capabilities

External / MCP toolremote capability or API

persists outputs

Persistenceyour cloud

Artifact / state storeyour S3 / GCS / Azure Blob

checkpoint outputs
files · errors · logs
replay lineage

Metadata storeruns · versions · statuses

read by

Operations

Kitaru UIbrowse · replay

CLIkitaru executions

Python SDKKitaruClient

MCP toolsfor AI assistants

Control planeOrchestration planeExecution planePersistenceConceptual — via adapters or your platform

Kitaru separates durable control flow (orchestration plane) from code execution (execution plane). Checkpoints are the contract between them.

Control / orchestration / execution

Kitaru splits runtime responsibilities into three planes. (This is separate from the harness / runtime / platform split, which is about where Kitaru sits in the broader agent stack — not about how a single run executes.)

Control planelong-lived · shared · the Kitaru server

Auth

Deployment registry

Execution metadata

Checkpoint state

Log metadata

Orchestration planeper-run · durable · the runner

Checkpoint order

Replay

Resume

Wait / suspend

Retry policy

State durability

Execution planewhere code runs · what @checkpoint targets

Inline

Isolated job

Sandbox

External / MCP tool

Custom backend

Shipped execution targetConceptual — same contract, via adapters or your platform

The three planes run independently. The control plane survives if a runner dies. A runner survives if an execution target dies.

Plane	What lives here	Responsibility
Control plane	Kitaru server, UI, metadata DB, deployment registry, CLI/SDK/MCP APIs, auth and credential brokering	Knows what exists and who can call what
Orchestration plane	The runner for a single execution	Owns durable control flow for one run
Execution plane	Inline process or isolated container (shipped today); sandboxes, external tools, and custom backends are conceptual extensions of the same contract	Performs work

The control plane is long-lived and shared. The orchestration plane is per-run and durable. The execution plane is where your code (and your agent's code) actually executes.

What runs where?

The most common confusion is which component runs user code. This table makes it explicit.

Component	What it does	Runs user code?
Kitaru server (control plane)	Stores deployment registry, execution metadata, checkpoint state, log metadata, auth and session state	No
Runner (orchestration plane)	Runs the selected flow snapshot, controls checkpoint order, persists durable state, handles retry / replay / resume / wait	Yes, for inline checkpoints
Inline execution	Runs a checkpoint inside the runner process/pod	Yes
Isolated runtime	Runs a checkpoint in a separate container, job, pod, or remote compute backend	Yes
Sandbox (conceptual)	The same contract as isolated, tightened with stronger isolation or restricted egress. Not a shipped Kitaru execution target today — provided via adapters / your platform.	Yes, where integrated
External tool / MCP server	Performs work through a remote API or capability	Outside Kitaru
Metadata store	Stores runs, versions, checkpoint statuses, replay lineage	No
Artifact / state store	Stores checkpoint outputs, files, logs, replay lineage	No

The run, step by step

Here is what actually happens when a consumer invokes a flow.

Request arrives. A user, service, or upstream agent calls the Kitaru invocation API (via CLI, SDK, MCP, or HTTP).

Server resolves the flow. The server authenticates the caller, resolves the target flow (and optionally a version or tag), validates the input schema, and creates a run record plus a FlowHandle.

Runner starts. The control plane schedules a runner on your configured stack — a Kubernetes pod, a cloud job, or the local process in dev. The runner loads the selected flow snapshot.

Runner executes checkpoints in order. For each checkpoint, the runner either executes inline or delegates to an isolated target. It waits for the result, persists the output to the artifact/state store, and advances.

State is durable the entire time. If a checkpoint fails, if the runner dies, or if a kitaru.wait() suspends the run, the server retains everything needed to retry, replay, or resume later.

Consumer observes results. The caller uses the returned FlowHandle (or the UI / CLI / SDK / MCP) to tail logs, inspect checkpoints, provide human input, replay, or cancel.

Runner vs sandbox

This is the idea that tends to click last and matter most.

The runner is the durable brain of a run.
The sandbox (or isolated runtime) is the hands that perform work.

If a sandbox dies mid-execution — a container evicted, a network partition, a pod OOM — the runner still holds durable checkpoint state and can retry that single checkpoint, resume from the last known boundary, or replay the run with a modified input or code version. The sandbox's failure is localized to the checkpoint that was executing, not the whole agent.

This is why platform teams should not confuse "I have a sandbox provider" with "I have durable execution". A sandbox is a bounded execution environment. Durable execution is a property of the surrounding runner — and of the checkpoints it persists.

Inline vs isolated checkpoints

Every checkpoint picks an execution target. Two are built in today: inline (same process as the runner) and isolated (a separate container or job on the configured stack). Code examples, decision rules for when to reach for isolated, and the interaction with .submit() live on the Checkpoints page.

A failed checkpoint is durable context

In classical pipelines, a failed step is a crash. In Kitaru, a failed checkpoint is durable context — something the runner, the agent loop, a human, or a retry policy can reason about.

Consider a document-synthesis agent:

query_expansionsuccess · artifact: expanded query set

retrievalsoft failure · artifact: 'document missing / entitlement denied'

synthesisnot yet run

failure becomes durable context

Recovery paths

Retrysame input, same code

Replay with new inpute.g. corrected document id

Replay with new codee.g. new retrieval strategy

Feed error into the agent looplet the agent self-correct

Wait for human correctionkitaru.wait(), then resume

In classical pipelines a failed step is a crash. In Kitaru it’s a typed artifact that every recovery path can read.

Because the retrieval checkpoint's failure is persisted as a typed artifact, a downstream consumer has several real options:

Retry the same checkpoint with the same input
Replay with a modified input (e.g. a corrected document id)
Replay with modified code (e.g. a new retrieval strategy)
Feed the error artifact back into the agent loop so it can self-correct
Wait for a human to provide a correction via kitaru.wait(), then resume

This is what "agent-native error handling" means in practice: failures become data, and durable state survives them.

How deep do you integrate?

You don't have to restructure your agent to get value. Pick the depth that fits.

Level 0 — Black-box harness

Wrap the entire agent run as one checkpoint.

flow
└── checkpoint: run_agent()
    └── PydanticAI / LangGraph / Claude Agent SDK / custom loop

Fastest integration
Minimal code changes
Framework-agnostic

The tradeoff: replay boundary is coarse (one per agent run) and you see less of the agent's internal state.

Level 1 — Coarse workflow checkpoints

Add checkpoints around the phases that matter to your team.

flow
├── checkpoint: plan()
├── checkpoint: retrieve()
├── checkpoint: act()
├── checkpoint: synthesize()
└── checkpoint: validate()

Useful replay points
Better audit trail
Good balance of portability and durability

The tradeoff: you (not the framework) decide where the boundaries go.

Level 2 — Framework-aware adapter

Use a Kitaru adapter that tracks the framework's internals (model calls, tool calls, intermediate state) as child events under the enclosing checkpoint.

flow
└── checkpointed framework runtime
    ├── model calls
    ├── tool calls
    ├── intermediate state
    └── final output

Richer introspection
Better debugging
Tighter developer experience

The tradeoff: adapters are per-framework and need maintenance. The Pydantic AI adapter is the first one shipped.

Framework-agnostic by construction

Kitaru does not require your agent to be written as a graph. @checkpoint wraps ordinary Python function boundaries, independent of the harness.

That means a platform team supporting multiple harnesses — PydanticAI here, LangGraph there, Claude Agent SDK for one team, a raw-Python loop for another — can still standardize durability, replay, and execution metadata on a single runtime primitive. The harness choice stays a per-team decision.

Fits behind your platform

Kitaru can be used directly through its invocation API, or placed behind your existing platform/gateway:

Consumerinternal user · product · upstream agent

Your gateway / product APIwhat your org owns

auth · entitlements
rate limits · policy
interceptors · guardrails
product-specific endpoints

Kitaru invocation APIthe runtime primitive

version / tag resolution
schema validation
run record + FlowHandle
credential brokering

Runner on your stackdurable execution

checkpoint order
replay · resume · wait
artifacts + state
retry + isolation

Artifacts + stateyour S3 / GCS / Azure Blob bucket

Kitaru drops in underneath your existing platform. Your auth, UI, and governance stay yours.

You keep your auth, entitlements, interceptors, observability, and UI. Kitaru handles the durable execution layer underneath. This is how Kitaru drops into a finance- or regulated-industry-style internal agent platform without asking you to rebuild the surrounding system.

Local development

When you are developing locally, all three components run inside a single Python process on your machine. The server is embedded — no separate service to start, no database to configure. Checkpoint outputs are written to your local filesystem.

This means you can install Kitaru, run kitaru init, and have a fully working durable execution environment in under a minute. Your flows behave exactly the same as they will in production — same checkpointing, same replay, same observability — just without the cloud infrastructure underneath.

Optionally, you can run a local server and UI to browse executions in a web UI.

Production

In production, the three components separate across your infrastructure:

The server runs as a long-lived Kubernetes pod (deployed via Helm). It stores execution state in a database and serves the UI. Your whole team connects to it.
The runner runs on the compute backend defined by your stack — Kubernetes, Vertex AI, SageMaker, AzureML. When you call .run(), the client fetches short-lived credentials from the server and dispatches the execution directly to the compute backend. The runner executes your checkpoints and writes outputs to cloud storage. If the execution crashes, replay picks up from the last completed checkpoint.
Artifacts and state live in your own S3 / GCS / Azure Blob bucket. The server tracks metadata but does not access storage directly; when a client needs to read files, it fetches temporary credentials brokered by the server.

There is no mandatory SaaS control plane in the path of your agent's data.