Harness, Runtime, Platform

Agent tooling spans four layers. Confusion between them is where most "is Kitaru a competitor to X?" questions come from.

Model layerthe LLM itself · your pick

OpenAI

Anthropic

Google

Open-weights

Fine-tuned in-house

Harness layerthe loop around the model

Pydantic AI / Harness

LangGraph

Claude Agent SDK

OpenAI Agents SDK

Raw Python

Runtime layerhow the agent survives · Kitaru lives here

Checkpoints

Replay

Resume

Wait

Versions + tag routing

Invocation

Artifacts + state

Isolated execution

Platform layerhow the org governs · stays yours

Auth + entitlements

Interceptors + guardrails

Observability

Product UI

Policy

Composable (your choice)Where Kitaru lives

Harnesses define behavior. Kitaru defines durable execution. Platforms define governance.

Model layer — the LLM itself. A compute unit over a context window, picked per-call or per-agent: OpenAI, Anthropic, Google, open-weights, fine-tuned in-house.
Harness layer — the loop around the model. Prompts, tools, model loop, context management, structured outputs, in-turn memory. Picked per-agent or per-team.
Runtime layer — how the agent survives and executes over time. Checkpoints, replay, resume, wait states, versioned deployments, invocation routing, artifact + state handling, execution placement.
Platform layer — how the organization governs. Auth, entitlements, interceptors, observability, product UI, policy. Usually lives in your existing stack.

Kitaru sits in the runtime layer. It is not a harness and it is not a packaged platform. It gives platform teams the durable execution primitives they attach to the harness their app teams picked and the platform their org already runs.

Where Kitaru is — and isn't

Tool	Primary layer	What it optimizes for
Pydantic AI / Pydantic AI Harness	Harness	Typed, ergonomic Python agent logic
Claude Agent SDK	Harness	Claude-native autonomous coding / tool loops
OpenAI Agents SDK	Harness	Hosted-tool agents on the OpenAI stack
LangGraph	Harness + runtime (in its own model)	Graph-native agents with built-in checkpointer
Deep Agents	Harness (on LangGraph)	Opinionated multi-agent pattern
LangSmith Deployment	Runtime + platform (packaged)	Adopting the LangChain-hosted stack
Temporal	Runtime (general-purpose)	Polyglot, deterministic workflow engine
DBOS	Runtime (general-purpose)	Postgres-backed durable workflows
Kitaru	Runtime (Python-agent-shaped)	Framework-agnostic durable execution primitives

The overlap

Several tools in the runtime row are real alternatives to Kitaru. Worth naming the overlap before drawing the distinction.

LangGraph has its own checkpointer, resume, and time-travel — powerful inside its graph/state-machine model. Kitaru's difference is that @checkpoint wraps ordinary Python boundaries independent of any harness.
LangSmith Deployment delivers durable execution + sandboxes + auth proxy as a packaged platform. Kitaru ships just the runtime primitives so platform teams bring their own auth, sandbox provider, and governance.
Temporal is a battle-tested polyglot durable workflow engine. Kitaru is Python-first, agent-shaped (first-class kitaru.llm(), memory, kitaru.wait(), artifact lineage), with a simpler single-service deployment.
DBOS is a Postgres-backed durable workflow library with deterministic workflow bodies. Kitaru flows are plain Python with no determinism requirement; state and artifacts live in your own cloud bucket, not Postgres.

Two worldviews

Harness-first
  "Let's give developers a better way to build agents"
    → agent logic → tools → memory → state → deployment

Runtime-first (Kitaru)
  "Agent work is long-running infrastructure work"
    → runtime → checkpoints → execution targets → harness integration

Neither is universally better. They optimize for different buyers.

Individual or small team building one agentoptimize for velocity

pick a harness (Pydantic AI / Harness, LangGraph, Claude SDK…)
adopt its runtime if it has one
Kitaru is probably overkill

Platform team supporting many agent teamsoptimize for durability + portability

teams pick their own harness
durable execution must be harness-independent
infra must be self-hosted
Kitaru is the right size primitive

Harness-first tools optimize for how a single agent is built. Kitaru optimizes for how many agents are run.

What Kitaru owns vs integrates with

Platform teams rightly push back on tools that try to own everything. What Kitaru actually takes responsibility for:

Concern	Kitaru owns?	Kitaru's stance
Checkpoint / replay / resume	Yes	Core product
Flow versioning and invocation routing	Yes	Core product
Execution placement per checkpoint	Yes, as config	`@checkpoint(runtime="isolated")` today; richer policy evolving
Sandbox implementation	No	Provide adapters; don't mandate a vendor
Secrets storage	Partly	Alias-linked secret resolution for `kitaru.llm()`; integrate with your secret manager
Auth to invoke flows	Yes	Workspace keys / service accounts; no per-deployment tokens
Enterprise entitlements / RBAC	No	Integrate with your platform
Network egress policy	No	Determined by the execution target your stack provides; Kitaru does not enforce it
Interceptors / guardrails	No	Harness or your platform owns this
Observability	Partly	Runtime metadata, logs, artifact lineage; integrate with your tracing
Data compliance policy	No	Policy stays with your platform; Kitaru does not mandate one

The line to remember:

Durability without execution policy is not enough for production agents — but Kitaru should make policy attachable to execution boundaries, not mandate the policy itself.

Concrete split in code

A Python research agent, with each layer doing its part:

from kitaru import flow, checkpoint, wait

@checkpoint
def plan(question: str) -> dict:
    # Harness (Pydantic AI / raw LLM / whatever) lives INSIDE the checkpoint.
    return pydantic_agent.run_sync(question).output

@checkpoint
def retrieve(plan: dict) -> list[dict]:
    return search_docs(plan)

@checkpoint
def synthesize(docs: list[dict]) -> str:
    return claude_agent.answer(docs)

@flow
def research_agent(question: str) -> str:
    p = plan(question)
    docs = retrieve(p)
    approved = wait(name="approve", question="Looks right?", schema=bool)
    return synthesize(docs) if approved else "rejected"

Harness decides how plan, retrieve, synthesize reason.
Kitaru runtime decides what is durable, what can replay, what waits, where each checkpoint runs.
Your platform decides who can invoke research_agent, which stack it runs on, and what gets logged where.

When Kitaru is the wrong size

If your whole org standardizes on LangGraph + LangSmith, Kitaru adds less. Use what you have.
If you are building one agent for yourself and never leave your laptop, a harness alone is enough.
If you want a hosted, all-in-one agent platform and don't need to self-host anything, a packaged platform is the better buy.

When Kitaru fits

Application teams across your org pick different harnesses (Pydantic AI, Langchain's Deep Agents, Claude Agent SDK, internal).
Infra must be self-hosted (regulated industry, on-prem requirements, sovereignty).
The platform team wants runtime primitives, not a packaged platform that replaces the one they already operate.
Deployment must plug into existing Kubernetes, secret manager, observability, and data policy — not live in someone else's control plane.
Durable execution needs to be independent of any single framework's worldview.

Shorthand

Harnesses define behavior. Kitaru defines durable execution. Platforms define governance.

Or the even shorter version:

Use a harness to build the agent. Use Kitaru when that agent becomes a durable, versioned, self-hosted production workload.