Overview
Durable execution for AI agents
Kitaru is an orchestration layer that makes AI agent workflows persistent, replayable, and observable — without requiring you to learn a graph DSL or change your Python control flow.
Create a durable agent
import kitaru
from kitaru import checkpoint, flow
@checkpoint
def research(topic: str) -> str:
return kitaru.llm(f"Summarize {topic} in two sentences.")
@checkpoint
def draft_report(summary: str) -> str:
return kitaru.llm(f"Write a short report based on: {summary}")
@flow
def research_agent(topic: str) -> str:
summary = research(topic)
return draft_report(summary)
if __name__ == "__main__":
research_agent.run(topic="Why do AI agents need durable execution?")Each @checkpoint is a durable unit of work — its output is persisted
automatically. If the flow fails at draft_report, replaying it skips
research and reuses its recorded result. kitaru.llm() tracks model
calls with prompt, response, usage, and cost capture built in.
See the Quickstart to install and run this yourself.
What your agent can do with Kitaru
Add these primitives to any Python agent code — no rewrites required.
- Crash recovery: Wrap steps in
@checkpointand your agent picks up where it left off — no re-running expensive LLM calls - Human-in-the-loop: Add
kitaru.wait()and let agents hand off to a human while compute is released — resume minutes or months later - Tracked LLM calls: Use
kitaru.llm()and every call gets automatic secret resolution, prompt/response capture, and cost tracking - Persistent data:
kitaru.save()/kitaru.load()let agents store and retrieve files, objects, and results across executions - Structured observability:
kitaru.log()attaches key-value metadata to any checkpoint or flow for debugging and the UI - Runtime configuration:
kitaru.configure()sets your model, log store, and stack defaults in one call - Execution management:
KitaruClientlets you inspect, replay, retry, resume, and cancel any execution from code or CLI
Next Steps
Installation
Install Kitaru with uv or pip
Quickstart
Run a tiny flow end to end
Examples
Browse runnable workflows grouped by goal
Core Concepts
Understand flows, checkpoints, and the execution model
Execution Management
Inspect runs, replay, retry, resume, and fetch logs
Wait, Input, and Resume
Pause flows for external input and continue the same execution
Tracked LLM Calls
Use kitaru.llm() with aliases, secrets, and captured artifacts
Secrets + Model Registration
Store provider credentials, register a model alias, and use kitaru.llm()
Configuration
Set runtime defaults and understand override precedence
Stacks
Create, inspect, switch, and clean up local, Kubernetes, Vertex, SageMaker, and AzureML stacks
MCP Server
Query and manage executions via MCP tools
Claude Code Skills
Install the Kitaru scoping and authoring skills
CLI Reference
Browse the generated command reference
Blog
Read essays on durable execution, agent infrastructure, and Kitaru's design