Kitaru by ZenML Open source · Apache 2.0

The open agent runtime.

Replay, don’t guess. Rerun production runs against new models, prompts, or skills. Compare evidence. Ship the variant that wins.

pip install kitaru

Book a demo View on GitHub

replay · legal-research-prod compare

Pack high-cost-contracts · 50 production runs · last 7d

Baseline legal-research-prod v2.3

Candidate branch-cache +caching wrapper on clause extraction

# replay both against the same frozen pack

kitaru.compare(

baseline="legal-research-prod",

candidate="branch-cache",

pack="high-cost-contracts",

)

Metric Baseline Candidate Delta

Median cost €0.34 €0.19 −44%

Avg LLM calls 14.2 6.8 −52%

Evaluator pass rate user-supplied 84% 82% −2pp

Changed outputs — 2 / 50 review

Same cohort, different variant. Outputs hold.

THE MISSING LAYER

The inner harness is getting absorbed. The outer runtime is still missing.

Models and SDKs increasingly own the inner loop — call, tool, observe, repeat. The layer around it stays unsolved: recovery, sandboxes, approvals, replay, cheap waiting. Keep your SSO, RBAC, cloud accounts, and agent SDKs. Kitaru fills the gap.

Model layer reasoning and native tool-use behavior

OpenAI Anthropic Google Open-weights Fine-tuned in-house your pick

Harness layer the inner loop around the model

Pydantic AI / Harness LangGraph Claude Agent SDK OpenAI Agents SDK Raw Python your pick

Runtime layer Kitaru — the outer runtime

checkpoints replay resume wait() isolated execution versions + tag routing invocation artifacts + state

Platform layer how the org governs and exposes agents

auth entitlements interceptors guardrails observability product UI yours, unchanged

Recovery

Agent died at hour 11? Resume from hour 10.

Every durable checkpoint persists its inputs and outputs. Fix the failure and replay from the boundary that broke instead of re-running every model call and tool call upstream.

Trust boundaries

The brain and the hand should not share a process.

Run risky tools in controlled execution targets while the durable runner owns state, replay, and policy hooks. A sandbox failure becomes one failed checkpoint, not a lost agent.

Cheap waiting

Wait on humans, webhooks, or agents without burning compute.

wait() suspends the run and releases the worker. When the input arrives minutes, hours, or days later, Kitaru rehydrates the execution and continues.

Kitaru isn’t an agent platform. It’s the runtime layer your platform runs agents on.

Kubernetes can restart a pod. It cannot replay an agent from checkpoint 10 or wait three days for a human approval without burning compute.

Bundled agent platforms ship one model, one harness, one cloud. Kitaru is the open assembled stack — same primitives, your stack.

Open source. Self-hosted. No lock-in to a model, harness, or platform.

WORKS WITH YOUR AGENT SDK

Keep the inner harness. Add the outer runtime.

Keep the model loop your team already picked. Wrap the boundaries that matter — model calls, tool calls, approvals, and long-running phases — with durable state and replay.

Before agent.py

from agents import Runner
from agents.sandbox import SandboxAgent, Manifest

agent = SandboxAgent(
    name="Compliance Reviewer",
    model="gpt-5.4",
    default_manifest=manifest,
)

result = await Runner.run(agent, task)

With Kitaru agent.py

from kitaru.adapters.openai_agents import KitaruRunner, OpenAIRunRequest
from agents.sandbox import SandboxAgent, Manifest

agent = SandboxAgent(
    name="Compliance Reviewer",
    model="gpt-5.4",
    default_manifest=manifest,
)

runner = KitaruRunner(agent, checkpoint_strategy="calls")
result = await runner.run(OpenAIRunRequest.start(task))

Before agent.py

from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions(
    system_prompt="You are a compliance reviewer.",
    allowed_tools=["search_docs", "fetch_policy"],
)

async for msg in query(prompt=task, options=options):
    process(msg)

With Kitaru agent.py

from kitaru import flow, checkpoint
from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions(
    system_prompt="You are a compliance reviewer.",
    allowed_tools=["search_docs", "fetch_policy"],
)

@checkpoint
async def run_claude(task: str) -> list:
    return [msg async for msg in query(prompt=task, options=options)]

@flow
async def review(task: str):
    for msg in await run_claude(task):
        process(msg)

Before agent.py

from pydantic_ai import Agent

agent = Agent(
    "openai:gpt-5.4",
    system_prompt="You are a compliance reviewer.",
    tools=[search_docs, fetch_policy],
)

result = await agent.run(task)

With Kitaru agent.py

from kitaru.adapters.pydantic_ai import KitaruAgent
from pydantic_ai import Agent

agent = KitaruAgent(Agent(
    "openai:gpt-5.4",
    system_prompt="You are a compliance reviewer.",
    tools=[search_docs, fetch_policy],
))

result = await agent.run(task)

Before agent.py

from anthropic import Anthropic

client = Anthropic()

def my_agent(task: str) -> str:
    plan = analyze(client, task)
    # crash here? everything above is lost.
    result = execute(client, plan)
    return result

With Kitaru agent.py

from kitaru import flow, checkpoint
from anthropic import Anthropic

client = Anthropic()

@flow
def my_agent(task: str) -> str:
    plan = checkpoint(analyze)(client, task)
    result = checkpoint(execute)(client, plan)
    return result

CORE RUNTIME PRIMITIVES

The outer runtime production agents keep needing.

The model can own the loop. The runtime has to own what happens when the loop waits, fails, forks, or moves to remote infrastructure.

01 · Cheap Waiting

Pause. Release compute. Continue later.

Agents can wait on a human, another agent, or a webhook for hours or days. Kitaru persists the run, shuts down the worker, and rehydrates when input arrives.

Waiting for input...

02 · Durable State

Crash at step 6? Replay from step 6.

Every checkpoint writes real outputs to the artifact store. Fix the issue and replay from the failed boundary instead of re-burning upstream model and tool calls.

Running agent...

03 · No Harness Bet

Keep the inner loop your team picked.

OpenAI Agents SDK, Anthropic Agent SDK, PydanticAI, LangGraph, or raw Python. Kitaru adds the runtime around the harness instead of replacing it.

kitaru

OpenAI Agents SDK

Anthropic Agent SDK

PydanticAI

LangGraph

04 · Parallel Recovery

Fan out work without losing the thread.

checkpoint.submit() dispatches branches concurrently. Each branch keeps its own durable boundary, so one failed search does not erase the rest.

SEE THE CODE

The outer runtime is just Python.

agent.py

import kitaru
from kitaru import flow, checkpoint
 
kitaru.configure(stack="prod-k8s")
 
@checkpoint
def research(topic: str) -> dict:
    results = run_agent_search(topic)
    kitaru.save("sources", results)
    return summarize(results)
 
@checkpoint(runtime="isolated")
def write_draft(context: str, prev_id: str) -> str:
    prior = kitaru.load(prev_id, "sources")
    return kitaru.llm(
        "Draft a report on: " + context,
        model="gpt-4o",
    )
 
@flow
def report_agent(topic: str, prev_id: str) -> str:
    data = research(topic)
    draft = write_draft(str(data), prev_id)
    kitaru.log(topic=topic, words=len(draft.split()))
 
    approved = kitaru.wait(
        schema=bool, question="Publish?"
    )
    if approved:
        publish(draft)
    return draft

@flow

Top-level durable boundary for the agent run.

@checkpoint

Persists output. Crash at step 3? Steps 1-2 never re-run.

kitaru.wait()

Suspends the run. Resume when a human, webhook, or agent responds.

kitaru.llm()

Resolves the model alias, injects the key, and captures usage.

kitaru.log()

Structured metadata on every execution. Query it in the dashboard.

kitaru.save()

Persist the real intermediate outputs your agent will need later.

kitaru.load()

Retrieve saved artifacts from an earlier execution by ID.

kitaru.configure()

Move the same flow from local dev to remote infrastructure.

FROM LAPTOP TO CLOUD

From laptop session to enterprise runtime.

Run the same agent flow locally, then move it to Kubernetes, SageMaker, Vertex AI, or AzureML with artifacts in your own bucket.

uv add kitaru && kitaru init

Kitaru dashboard showing an execution with its checkpoints, logs, and cost — Checkpoints, artifacts, replay, and logs ship with the server.

Star on GitHub See the roadmap

EXECUTION ARCHITECTURE

The brain and the hand should not have to share a process.

The runner owns durable control flow — checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work: inline, in an isolated job, inside a sandbox, or through an external tool. Checkpoints are the contract between them.

Inner Harness

OpenAI Agents SDK Anthropic Agent SDK PydanticAI LangGraph Raw Python

Model loop

Kitaru SDK

@flow @checkpoint wait() llm() log() save() load() configure()

Outer runtime primitives

Kitaru Runtime

Checkpoint order Replay Resume Wait Artifacts & state Versions

Durable brain

Execution Targets

Inline Isolated job Sandbox External / MCP tool Custom backend

Hands that do work

Your Infrastructure

Kubernetes AWS / GCP / Azure S3 / GCS SQL Database

Self-hosted substrate

agent.py

import kitaru
from kitaru import flow, checkpoint

@flow
def coding_agent(issue: str) -> str:
    plan = analyze_issue(issue)
    patch = write_code(plan)

    # Pauses. Resumes when input arrives.
    approved = kitaru.wait(
        bool, question="Merge this PR?"
    )
    if approved:
        merge(patch)
    return patch

Your agent died at hour 11.
Don’t restart from hour 1.

pip install kitaru

Book a demo Star on GitHub

Open source (Apache 2.0). Add the runtime around the agent you already have.

The open agent runtime.

The inner harness is getting absorbed. The outer runtime is still missing.

Agent died at hour 11? Resume from hour 10.

The brain and the hand should not share a process.

Wait on humans, webhooks, or agents without burning compute.

Keep the inner harness. Add the outer runtime.

The outer runtime production agents keep needing.

Pause. Release compute. Continue later.

Crash at step 6? Replay from step 6.

Keep the inner loop your team picked.

Fan out work without losing the thread.

The outer runtime is just Python.

From laptop session to enterprise runtime.

The brain and the hand should not have to share a process.

Foundations and proof for Kitaru

From ZenML to Kitaru

Why agents need durable execution

Agents need more than traces

Your agent died at hour 11. Don’t restart from hour 1.

Your agent died at hour 11.
Don’t restart from hour 1.