Coming soon — early access March 2026

Your agent runs.
Now make it survive.

Lightweight durable execution for AI agents in Python. Crash recovery, cost tracking, human-in-the-loop, and full lineage — without the distributed systems baggage.

Open source. Free to start. No credit card required.

# Use any agent framework you like from pydantic_ai import Agent from kitaru import KitaruAgent   agent = Agent(     'anthropic:claude-sonnet-4-6',     system_prompt='You are a code reviewer.',     tools=[read_file, run_tests], )   durable_agent = KitaruAgent(agent) result = await durable_agent.run(     "Review PR #42 and suggest fixes" ) # Durable. Auditable. Resumable.
# No framework needed — just Python from kitaru import workflow, step   @step async def analyze_code(pr_diff: str):     response = await openai.chat.completions.create(         model="gpt-4o",         messages=[{"role": "user", "content": pr_diff}],     )     return response.choices[0].message.content   @workflow async def review_pr(pr_id: int):     diff = await fetch_pr(pr_id)     return await analyze_code(diff)

You built the agent.
Now run it without duct tape.

Agents aren't microservices. They don't need microservice infrastructure.

Temporal / DBOS

Too heavy

Built for microservice transactions, not agents. Python as an afterthought. Weeks to set up. You need a distributed systems degree to debug the event history.

LangGraph

Owns your agent

Opinionated about memory, message format, state schema. When you rewrite your agent next month, you rewrite everything.

Cloud functions

Locked to one cloud

AWS Step Functions, Azure Durable Functions, Cloudflare Workers. Different APIs, different limits, no portability.

DIY

Months of glue

Temporal + LangSmith + custom retry logic + cost tracking scripts + deployment infra. You're building infrastructure, not agents.

Infrastructure that survives
every framework rewrite.

Wrap your existing agent. Kitaru handles durability, cost tracking, replay, and human-in-the-loop underneath.

01

Crash recovery without replay complexity

Kitaru checkpoints every step output. On failure, your workflow re-executes and skips completed steps via cache hits. No determinism constraints. No replay brittleness. Deploy new code without breaking running agents.

workflow.py
from kitaru import workflow, step, call_llm   @step async def plan_research(query: str):     return await call_llm(f"Plan: {query}")   @workflow async def research(query: str):     plan = await plan_research(query)     # If this crashes, plan_research     # won't re-run. Checkpointed.     return await execute(plan)
02

Cost tracking you didn't have to build

Every LLM call is automatically instrumented. Tokens, cost, latency, model version, prompt hash — all queryable from the metadata store. Sort runs by cost. Set per-agent budgets. No Langfuse bolted on.

Terminal
$ kitaru runs list --sort-by=cost
 
Run ID      Agent          Cost    Duration
run-847   research_agent  $0.42   3m 12s  
run-846   research_agent  $1.87   8m 45s  
run-845   code_reviewer   $0.03   12s     
03

Replay from any step. Change the input. Compare.

Your agent made a bad plan? Go back to that step, modify the input, replay from there. Compare both runs side-by-side in the dashboard. Content-addressable, versioned, diffable checkpoints — lineage tracking for agents.

debug.py
# Run failed at step 3? Resume with new input. from kitaru import replay   fixed = replay(     run_id="run-847",     from_step="plan_research",     input="Focus on gene editing only", )   # Compare runs side-by-side in the dashboard # Content-addressable, versioned, diffable
04

Human-in-the-loop as a primitive

Not a hack, not a webhook — a first-class primitive. wait_for_input() suspends execution, releases compute, and resumes when a human provides input. Hours or days later.

approval.py
from kitaru import workflow, wait_for_input   @workflow async def agent_with_approval(query):     plan = await generate_plan(query)       # Pod dies. Compute released.     # Resume hours later with input.     decision = await wait_for_input(plan)       if decision.approved:         return await execute_plan(plan)

Start local. Deploy anywhere.
No workers. No queues. No BS.

Agents aren't microservices — they don't need microservice infrastructure. No Temporal servers, no worker fleets, no event sourcing. Just kitaru dev on your laptop, then kitaru deploy when you're ready.

Terminal
$ pip install kitaru
Installed kitaru-0.1.0
 
$ kitaru dev
Local server running at localhost:8080
Dashboard at localhost:8080/dashboard
Ready. Run your agent.
 
$ # When you're ready for production:
$ kitaru deploy --cloud aws
Deployed to AWS us-east-1
Dashboard at https://app.kitaru.ai

What you don't need

  • No Temporal server No server cluster to manage, no worker processes to scale
  • No message queues No RabbitMQ, no SQS, no Kafka — checkpoints, not events
  • No determinism constraints Write normal Python. No replay rules. No side-effect restrictions.
  • No vendor lock-in Self-host anywhere. AWS, GCP, Azure, or your own Kubernetes.

Understand your agent
at a glance.

No 500-line event history. No distributed tracing PhD. A clean dashboard that shows exactly what your agent did, what it cost, and where it went wrong.

app.kitaru.ai/runs/run-847
run-847 completed
code_reviewer / $0.42 / 3m 12s
fetch_pr 0.2s / $0.00
analyze_code 45s / $0.38
LLM
wait_for_approval awaiting input
HITL
apply_fixes pending
Human input required

Agent wants to apply 3 fixes to src/auth.py. Approve?

Wrap any agent framework
PydanticAI
OpenAI Agents SDK
CrewAI
Anthropic SDK
Any Python agent
Deploy to your cloud
AWS
GCP
Azure
Kubernetes
Local

Not just durability.
The full agent lifecycle.

Built-in tools to build, debug, and iterate on your agents. MCP servers for tool discovery. Skills for reusable capabilities. Replay loops for debugging. Observability integrations for production.

MCP servers

Built-in MCP servers for tool discovery and management. Your agents find and use tools through a standard protocol — no custom integrations.

Debug and replay

Your agent made a bad decision at step 3? Go back, change the input, replay from there. Compare both runs side-by-side. Iterate until it works.

Observability

Plays nicely with your existing observability stack. Export traces, connect to your preferred monitoring tools. We capture the data — you choose where it goes.

Skills and templates

Reusable agent capabilities you can compose. Pre-built skills for common patterns — code review, data analysis, research — customize and extend.

Not another framework.
The layer underneath.

Temporal LangGraph DBOS Kitaru
Crash recovery replay ~ checkpoints DBOS Cloud checkpoints
Versioned step outputs built-in
Run diffing built-in
Cost tracking per run automatic
Cross-run lineage built-in
Python-native DX ~ painful decorators
Framework-agnostic LangChain ~ DBOS only any agent
No determinism tax strict rules + linter
Self-hosted / any cloud LangSmith DBOS Cloud any cloud

Built on the foundation of ZenML.
Battle-tested at scale.

Kitaru is built by the team behind ZenML — the open-source MLOps framework trusted by hundreds of teams to orchestrate production ML pipelines. The same engine that runs thousands of pipelines now powers your agents.

9,000+
GitHub stars
500+
Production teams
Millions
Pipeline runs orchestrated
3 years
Production-hardened

Kitaru uses the same checkpoint engine, metadata store, and cloud connectors that power ZenML — now purpose-built for AI agents that need to run for hours, survive crashes, and scale to thousands of concurrent executions.

Ship agents.
Not infrastructure.

Kitaru is launching soon. Join the waitlist for early access, and we'll tell you the moment it's ready.

Open source. Free to start. No credit card required.

Built by the team behind ZenML — production ML orchestration trusted by hundreds of teams.