Kitaru

Harness, Runtime, Platform

Where Kitaru fits — and doesn't — in an agent stack.

Agent tooling spans four layers. Confusion between them is where most "is Kitaru a competitor to X?" questions come from.

Model layerthe LLM itself · your pick
OpenAI
Anthropic
Google
Open-weights
Fine-tuned in-house
Harness layerthe loop around the model
Pydantic AI / Harness
LangGraph
Claude Agent SDK
OpenAI Agents SDK
Raw Python
Runtime layerhow the agent survives · Kitaru lives here
Checkpoints
Replay
Resume
Wait
Versions + tag routing
Invocation
Artifacts + state
Isolated execution
Platform layerhow the org governs · stays yours
Auth + entitlements
Interceptors + guardrails
Observability
Product UI
Policy
Composable (your choice)Where Kitaru lives
Harnesses define behavior. Kitaru defines durable execution. Platforms define governance.
  • Model layer — the LLM itself. A compute unit over a context window, picked per-call or per-agent: OpenAI, Anthropic, Google, open-weights, fine-tuned in-house.
  • Harness layer — the loop around the model. Prompts, tools, model loop, context management, structured outputs, in-turn memory. Picked per-agent or per-team.
  • Runtime layer — how the agent survives and executes over time. Checkpoints, replay, resume, wait states, versioned deployments, invocation routing, artifact + state handling, execution placement.
  • Platform layer — how the organization governs. Auth, entitlements, interceptors, observability, product UI, policy. Usually lives in your existing stack.

Kitaru sits in the runtime layer. It is not a harness and it is not a packaged platform. It gives platform teams the durable execution primitives they attach to the harness their app teams picked and the platform their org already runs.

Where Kitaru is — and isn't

ToolPrimary layerWhat it optimizes for
Pydantic AI / Pydantic AI HarnessHarnessTyped, ergonomic Python agent logic
Claude Agent SDKHarnessClaude-native autonomous coding / tool loops
OpenAI Agents SDKHarnessHosted-tool agents on the OpenAI stack
LangGraphHarness + runtime (in its own model)Graph-native agents with built-in checkpointer
Deep AgentsHarness (on LangGraph)Opinionated multi-agent pattern
LangSmith DeploymentRuntime + platform (packaged)Adopting the LangChain-hosted stack
TemporalRuntime (general-purpose)Polyglot, deterministic workflow engine
DBOSRuntime (general-purpose)Postgres-backed durable workflows
KitaruRuntime (Python-agent-shaped)Framework-agnostic durable execution primitives

The overlap

Several tools in the runtime row are real alternatives to Kitaru. Worth naming the overlap before drawing the distinction.

  • LangGraph has its own checkpointer, resume, and time-travel — powerful inside its graph/state-machine model. Kitaru's difference is that @checkpoint wraps ordinary Python boundaries independent of any harness.
  • LangSmith Deployment delivers durable execution + sandboxes + auth proxy as a packaged platform. Kitaru ships just the runtime primitives so platform teams bring their own auth, sandbox provider, and governance.
  • Temporal is a battle-tested polyglot durable workflow engine. Kitaru is Python-first, agent-shaped (first-class kitaru.llm(), memory, kitaru.wait(), artifact lineage), with a simpler single-service deployment.
  • DBOS is a Postgres-backed durable workflow library with deterministic workflow bodies. Kitaru flows are plain Python with no determinism requirement; state and artifacts live in your own cloud bucket, not Postgres.

Two worldviews

Harness-first
  "Let's give developers a better way to build agents"
    → agent logic → tools → memory → state → deployment

Runtime-first (Kitaru)
  "Agent work is long-running infrastructure work"
    → runtime → checkpoints → execution targets → harness integration

Neither is universally better. They optimize for different buyers.

Individual or small team building one agentoptimize for velocity
  • pick a harness (Pydantic AI / Harness, LangGraph, Claude SDK…)
  • adopt its runtime if it has one
  • Kitaru is probably overkill
Platform team supporting many agent teamsoptimize for durability + portability
  • teams pick their own harness
  • durable execution must be harness-independent
  • infra must be self-hosted
  • Kitaru is the right size primitive
Harness-first tools optimize for how a single agent is built. Kitaru optimizes for how many agents are run.

What Kitaru owns vs integrates with

Platform teams rightly push back on tools that try to own everything. What Kitaru actually takes responsibility for:

ConcernKitaru owns?Kitaru's stance
Checkpoint / replay / resumeYesCore product
Flow versioning and invocation routingYesCore product
Execution placement per checkpointYes, as config@checkpoint(runtime="isolated") today; richer policy evolving
Sandbox implementationNoProvide adapters; don't mandate a vendor
Secrets storagePartlyAlias-linked secret resolution for kitaru.llm(); integrate with your secret manager
Auth to invoke flowsYesWorkspace keys / service accounts; no per-deployment tokens
Enterprise entitlements / RBACNoIntegrate with your platform
Network egress policyNoDetermined by the execution target your stack provides; Kitaru does not enforce it
Interceptors / guardrailsNoHarness or your platform owns this
ObservabilityPartlyRuntime metadata, logs, artifact lineage; integrate with your tracing
Data compliance policyNoPolicy stays with your platform; Kitaru does not mandate one

The line to remember:

Durability without execution policy is not enough for production agents — but Kitaru should make policy attachable to execution boundaries, not mandate the policy itself.

Concrete split in code

A Python research agent, with each layer doing its part:

from kitaru import flow, checkpoint, wait

@checkpoint
def plan(question: str) -> dict:
    # Harness (Pydantic AI / raw LLM / whatever) lives INSIDE the checkpoint.
    return pydantic_agent.run_sync(question).output

@checkpoint
def retrieve(plan: dict) -> list[dict]:
    return search_docs(plan)

@checkpoint
def synthesize(docs: list[dict]) -> str:
    return claude_agent.answer(docs)

@flow
def research_agent(question: str) -> str:
    p = plan(question)
    docs = retrieve(p)
    approved = wait(name="approve", question="Looks right?", schema=bool)
    return synthesize(docs) if approved else "rejected"
  • Harness decides how plan, retrieve, synthesize reason.
  • Kitaru runtime decides what is durable, what can replay, what waits, where each checkpoint runs.
  • Your platform decides who can invoke research_agent, which stack it runs on, and what gets logged where.

When Kitaru is the wrong size

  • If your whole org standardizes on LangGraph + LangSmith, Kitaru adds less. Use what you have.
  • If you are building one agent for yourself and never leave your laptop, a harness alone is enough.
  • If you want a hosted, all-in-one agent platform and don't need to self-host anything, a packaged platform is the better buy.

When Kitaru fits

  • Application teams across your org pick different harnesses (Pydantic AI, Langchain's Deep Agents, Claude Agent SDK, internal).
  • Infra must be self-hosted (regulated industry, on-prem requirements, sovereignty).
  • The platform team wants runtime primitives, not a packaged platform that replaces the one they already operate.
  • Deployment must plug into existing Kubernetes, secret manager, observability, and data policy — not live in someone else's control plane.
  • Durable execution needs to be independent of any single framework's worldview.

Shorthand

Harnesses define behavior. Kitaru defines durable execution. Platforms define governance.

Or the even shorter version:

Use a harness to build the agent. Use Kitaru when that agent becomes a durable, versioned, self-hosted production workload.

On this page