Kitaru by ZenML Open source · Apache 2.0

The open agent runtime.

Replay, don’t guess. Rerun production runs against new models, prompts, or skills. Compare evidence. Ship the variant that wins.

pip install kitaru
THE MISSING LAYER

The inner harness is getting absorbed. The outer runtime is still missing.

Models and SDKs increasingly own the inner loop — call, tool, observe, repeat. The layer around it stays unsolved: recovery, sandboxes, approvals, replay, cheap waiting. Keep your SSO, RBAC, cloud accounts, and agent SDKs. Kitaru fills the gap.

Model layer reasoning and native tool-use behavior
OpenAI Anthropic Google Open-weights Fine-tuned in-house your pick
Harness layer the inner loop around the model
Pydantic AI / Harness LangGraph Claude Agent SDK OpenAI Agents SDK Raw Python your pick
Runtime layer Kitaru — the outer runtime
checkpoints replay resume wait() isolated execution versions + tag routing invocation artifacts + state
Platform layer how the org governs and exposes agents
auth entitlements interceptors guardrails observability product UI yours, unchanged
Durable agent state
Recovery

Agent died at hour 11? Resume from hour 10.

Every durable checkpoint persists its inputs and outputs. Fix the failure and replay from the boundary that broke instead of re-running every model call and tool call upstream.

One agent, three runtimes
Trust boundaries

The brain and the hand should not share a process.

Run risky tools in controlled execution targets while the durable runner owns state, replay, and policy hooks. A sandbox failure becomes one failed checkpoint, not a lost agent.

Runs you can actually compare
Cheap waiting

Wait on humans, webhooks, or agents without burning compute.

wait() suspends the run and releases the worker. When the input arrives minutes, hours, or days later, Kitaru rehydrates the execution and continues.

Kitaru isn’t an agent platform. It’s the runtime layer your platform runs agents on.

Kubernetes can restart a pod. It cannot replay an agent from checkpoint 10 or wait three days for a human approval without burning compute.

Bundled agent platforms ship one model, one harness, one cloud. Kitaru is the open assembled stack — same primitives, your stack.

Open source. Self-hosted. No lock-in to a model, harness, or platform.

WORKS WITH YOUR AGENT SDK

Keep the inner harness. Add the outer runtime.

Keep the model loop your team already picked. Wrap the boundaries that matter — model calls, tool calls, approvals, and long-running phases — with durable state and replay.

Before agent.py
from pydantic_ai import Agent

agent = Agent(
    "openai:gpt-5.4",
    system_prompt="You are a compliance reviewer.",
    tools=[search_docs, fetch_policy],
)

result = await agent.run(task)
With Kitaru agent.py
from kitaru.adapters.pydantic_ai import KitaruAgent
from pydantic_ai import Agent

agent = KitaruAgent(Agent(
    "openai:gpt-5.4",
    system_prompt="You are a compliance reviewer.",
    tools=[search_docs, fetch_policy],
))

result = await agent.run(task)
CORE RUNTIME PRIMITIVES

The outer runtime production agents keep needing.

The model can own the loop. The runtime has to own what happens when the loop waits, fails, forks, or moves to remote infrastructure.

01 · Cheap Waiting

Pause. Release compute. Continue later.

Agents can wait on a human, another agent, or a webhook for hours or days. Kitaru persists the run, shuts down the worker, and rehydrates when input arrives.

1
2
3
4
5
6
Waiting for input...
02 · Durable State

Crash at step 6? Replay from step 6.

Every checkpoint writes real outputs to the artifact store. Fix the issue and replay from the failed boundary instead of re-burning upstream model and tool calls.

1
2
3
4
5
6
7
Running agent...
03 · No Harness Bet

Keep the inner loop your team picked.

OpenAI Agents SDK, Anthropic Agent SDK, PydanticAI, LangGraph, or raw Python. Kitaru adds the runtime around the harness instead of replacing it.

kitaru
OpenAI Agents SDK
Anthropic Agent SDK
PydanticAI
LangGraph
04 · Parallel Recovery

Fan out work without losing the thread.

checkpoint.submit() dispatches branches concurrently. Each branch keeps its own durable boundary, so one failed search does not erase the rest.

SEE THE CODE

The outer runtime is just Python.

agent.py
import kitaru
from kitaru import flow, checkpoint
 
kitaru.configure(stack="prod-k8s")
 
@checkpoint
def research(topic: str) -> dict:
    results = run_agent_search(topic)
    kitaru.save("sources", results)
    return summarize(results)
 
@checkpoint(runtime="isolated")
def write_draft(context: str, prev_id: str) -> str:
    prior = kitaru.load(prev_id, "sources")
    return kitaru.llm(
        "Draft a report on: " + context,
        model="gpt-4o",
    )
 
@flow
def report_agent(topic: str, prev_id: str) -> str:
    data = research(topic)
    draft = write_draft(str(data), prev_id)
    kitaru.log(topic=topic, words=len(draft.split()))
 
    approved = kitaru.wait(
        schema=bool, question="Publish?"
    )
    if approved:
        publish(draft)
    return draft
@flow

Top-level durable boundary for the agent run.

@checkpoint

Persists output. Crash at step 3? Steps 1-2 never re-run.

kitaru.wait()

Suspends the run. Resume when a human, webhook, or agent responds.

kitaru.llm()

Resolves the model alias, injects the key, and captures usage.

kitaru.log()

Structured metadata on every execution. Query it in the dashboard.

kitaru.save()

Persist the real intermediate outputs your agent will need later.

kitaru.load()

Retrieve saved artifacts from an earlier execution by ID.

kitaru.configure()

Move the same flow from local dev to remote infrastructure.

FROM LAPTOP TO CLOUD

From laptop session to enterprise runtime.

Run the same agent flow locally, then move it to Kubernetes, SageMaker, Vertex AI, or AzureML with artifacts in your own bucket.

uv add kitaru && kitaru init
Kitaru dashboard showing an execution with its checkpoints, logs, and cost
Checkpoints, artifacts, replay, and logs ship with the server.
EXECUTION ARCHITECTURE

The brain and the hand should not have to share a process.

The runner owns durable control flow — checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work: inline, in an isolated job, inside a sandbox, or through an external tool. Checkpoints are the contract between them.

Inner Harness
OpenAI Agents SDK Anthropic Agent SDK PydanticAI LangGraph Raw Python
Model loop
Kitaru SDK
@flow @checkpoint wait() llm() log() save() load() configure()
Outer runtime primitives
Kitaru Runtime
Checkpoint order Replay Resume Wait Artifacts & state Versions
Durable brain
Execution Targets
Inline Isolated job Sandbox External / MCP tool Custom backend
Hands that do work
Your Infrastructure
Kubernetes AWS / GCP / Azure S3 / GCS SQL Database
Self-hosted substrate
agent.py
import kitaru
from kitaru import flow, checkpoint

@flow
def coding_agent(issue: str) -> str:
    plan = analyze_issue(issue)
    patch = write_code(plan)

    # Pauses. Resumes when input arrives.
    approved = kitaru.wait(
        bool, question="Merge this PR?"
    )
    if approved:
        merge(patch)
    return patch

Your agent died at hour 11.
Don’t restart from hour 1.

pip install kitaru

Open source (Apache 2.0). Add the runtime around the agent you already have.