Overview

Kitaru is an orchestration layer that makes AI agent workflows persistent, replayable, and observable — without requiring you to learn a graph DSL or change your Python control flow.

Create a durable agent

import kitaru
from kitaru import checkpoint, flow

@checkpoint
def research(topic: str) -> str:
    return kitaru.llm(f"Summarize {topic} in two sentences.")

@checkpoint
def draft_report(summary: str) -> str:
    return kitaru.llm(f"Write a short report based on: {summary}")

@flow
def research_agent(topic: str) -> str:
    summary = research(topic)
    return draft_report(summary)

if __name__ == "__main__":
    research_agent.run(topic="Why do AI agents need durable execution?")

Each @checkpoint is a durable unit of work — its output is persisted automatically. If the flow fails at draft_report, replaying it skips research and reuses its recorded result. kitaru.llm() tracks model calls with prompt, response, usage, and cost capture built in.

See the Quickstart to install and run this yourself.

What your agent can do with Kitaru

Add these primitives to any Python agent code — no rewrites required.

Crash recovery: Wrap steps in @checkpoint and your agent picks up where it left off — no re-running expensive LLM calls
Human-in-the-loop: Add kitaru.wait() and let agents hand off to a human while compute is released — resume minutes or months later
Tracked LLM calls: Use kitaru.llm() and every call gets automatic secret resolution, prompt/response capture, and cost tracking
Persistent data: kitaru.save() / kitaru.load() let agents store and retrieve files, objects, and results across executions
Structured observability: kitaru.log() attaches key-value metadata to any checkpoint or flow for debugging and the UI
Runtime configuration: kitaru.configure() sets your model, log store, and stack defaults in one call
Execution management: KitaruClient lets you inspect, replay, retry, resume, and cancel any execution from code or CLI