Kitaru vs Restate: Durable execution, shaped for Python agents

Restate is a serious durable runtime. Polyglot SDKs across TypeScript, Java, Kotlin, Go, Rust, and Python. BSL 1.1 runtime license, with Apache 2.0 as the change license after the change date. Three handler shapes (services, Virtual Objects with keyed per-entity state and strongly-consistent access, and workflows), journaled execution, awakeables for human-in-the-loop, OpenTelemetry tracing, self-hostable on every major cloud. If your durability problem is polyglot and backend-services-shaped, Restate is the pragmatic answer. That’s the honest thing to say up front.

Kitaru is narrower on purpose. We built it for Python agents, because that’s the workload most teams we talk to are trying to ship right now. Today, it’s increasingly obvious that every company running agents in production needs the same capabilities: a durable llm() call, wait/resume, typed versioned memory, artifact lineage across runs, human/agent approval gates, and a runtime that understands the cloud underneath it. Those don’t come in the box with any general-purpose durable runtime. So we put them there.

Kitaru

Use Kitaru if you are

Running Python agents and want `kitaru.llm()`, `kitaru.wait()`, `kitaru.memory`, and artifact lineage as primitives, not glue code your platform team maintains forever
Deploying across Kubernetes, AWS, GCP, or Azure and want an opinionated stack abstraction where one config switches every flow's backend
Happy with embedded durability (`@flow` and `@checkpoint` on ordinary Python) rather than handlers that register with a separate server
Designing around dynamic agent loops: tool calls, conditional branches, human/agent approval gates, hours-long waits

Alternative

Use Restate if you are

Running a polyglot services stack (TypeScript, Java, Kotlin, Go, Rust, Python) that needs one durability primitive across all of it
Doing durable RPC, sagas, or long-running backend workflows, not agent loops with LLM calls
Modeling keyed per-entity state with strongly-consistent access (chat sessions, orders, devices, users) as a first-class abstraction
Fine with running `restate-server` as infra and using awakeables / durable promises as the core HITL primitive

Restate adds durability to your services. Kitaru adds durability to your Python agents.

Agent-shaped vs app-shaped runtime

Defaults tell you what a tool is actually for. Restate’s defaults are services, Virtual Objects, and workflows. Real durability primitives, shaped for backend app workloads. Kitaru’s defaults are kitaru.llm(), kitaru.wait(), kitaru.memory, and versioned artifacts per checkpoint. Different opinion about the workload, same commitment to durability.

Restate · app-shaped Handlers for services, state, and workflows

service RPC-shaped durable handlers

virtual object keyed per-entity state, strongly consistent

workflow long-running, journaled, resumable

Durability primitives, built for app workloads.

Kitaru · agent-shaped Primitives for LLM, wait, memory, artifacts

@flow wraps your agent

kitaru.llm() alias-resolved keys · per-call token/latency logging

kitaru.wait() compute-released human-in-the-loop

kitaru.memory typed scopes across runs

@checkpoint versioned artifacts per step

PydanticAI adapter ships in the box.

In-the-box primitives: kitaru.llm() resolves the model alias, injects the provider key, and logs prompt, response, latency, tokens, and model on every call. Restate’s equivalent surface is services, Virtual Objects, workflows, and awakeables. Durability primitives, not agent primitives.
Framework adapters: Kitaru ships a PydanticAI adapter, so wrapping an existing agent harness in @flow and @checkpoint is a five-minute job. Restate works alongside agent SDKs via generic wrappers.
Both can be bent to the other’s shape. What ships in the box tells you what the team was watching when they set the defaults.

Artifact lineage vs journal and Virtual Object state

Both tools persist state. The data models are different, and they’re optimized for different shapes of work. Worth saying clearly.

exec_id 9f2a today · 14:02

@checkpoint research 1,247 tok · 2.8s

brief.json v3

@checkpoint draft 3,980 tok · 6.1s

draft.md v3

diff across runs

exec_id 7c1b Mon · 11:47

@checkpoint research 1,112 tok · 2.4s

brief.json v1

@checkpoint draft 4,215 tok · 6.8s

draft.md v1

Restate: handler progress is journaled; Virtual Object state is keyed per entity. Not surfaced as cross-run versioned artifacts with a diff view.

Kitaru: Every @checkpoint output lands as a typed, versioned artifact in your own S3, GCS, or Azure Blob bucket, linked to the execution record. Pick two runs, diff the artifacts at each checkpoint in the same UI. We took this straight from ZenML, where five years of ML work taught us that platform teams live or die on artifact lineage.
Restate: Handler progress is journaled for deterministic replay. Virtual Objects expose keyed per-entity state with strongly-consistent access and single-writer semantics per object. That’s the right shape for “the chat session”, “the order”, “the device”.
Honest read: VO-keyed per-entity state and versioned artifacts aren’t two answers to the same question. They’re two shapes of state, fit for different workloads.

Opinionated cloud-stack abstraction vs bring-your-own deployment

We spent five years at ZenML watching ML teams re-solve cloud topology for every new pipeline. Artifacts, secrets, compute, IAM: every cloud has its own version. JetBrains runs their AI globally on ZenML specifically to avoid re-solving this for every team. Same story is playing out for agent teams now. Kitaru’s answer is to ship the stack abstraction in the runtime.

review_flow one `@flow` decorator on your Python function

stack config switch backend without changing flow code

Kubernetes any cluster

AWS S3 + IAM + EKS

GCP GCS + Vertex AI

Azure Blob + AzureML

Restate: restate-server self-hosts on any of these; handlers deploy however you already deploy backend services. The runtime does not abstract the cloud backend for you.

Kitaru: Configure a stack once (AWS, GCP, Azure, Kubernetes). Every flow gets artifacts in your object store, secrets through the cloud’s provider, orchestration on the chosen compute. Swap the stack to switch clouds without changing flow code.
Restate: restate-server self-hosts on any of those clouds. That’s a real feature. The wedge isn’t where Restate can run, it’s that the runtime doesn’t abstract the cloud backend. You deploy handlers however you already deploy backend services.
The trade: Polyglot backend teams often want exactly Restate’s model. Python agent teams usually want the cloud topology handled by the runtime, not by the platform team for the tenth time.

Embedded library vs server-registered handlers

Both runtimes have a server. The wedge is whether your request path has to cross it.

Restate · server-registered Request path flows through the server

caller

restate-server Rust · BSL 1.1

journal routing replay

handler handler

Handlers register with the server. Every invocation crosses the server boundary.

Kitaru · embedded Flow runs in your process; server holds metadata

your python process

@flow

@checkpoint

kitaru.llm()

pip install kitaru · no server process in the request path

kitaru server metadata · UI · auth

Server is for the metadata, not the request path.

Kitaru: pip install kitaru, add @flow and @checkpoint to ordinary Python, your flow runs in your process. The Kitaru server stores execution metadata and powers the UI, CLI, and auth. It’s not a proxy every invocation crosses.
Restate: Handlers register with restate-server, and invocations flow through it. The server journals progress, routes requests, handles retries, and drives replay. That’s not incidental; it’s the durability model.
Selfishly, the Kitaru shape is what we want when we’re shipping an agent. Decorators on ordinary Python, flow runs where our code runs, metadata on the side. If your team wants server-mediated invocation by design, Restate’s shape fits better.

What makes Kitaru unique

Feature	Kitaru	Restate
Durable execution and replay	Yes	Yes
Human-in-the-loop waiting with compute released	Yes	Yes
Self-hostable (Kitaru: Apache 2.0; Restate: BSL 1.1, Apache 2.0 as the change license)	Yes	Yes
OpenTelemetry tracing	Yes	Yes
Python-agent-shaped primitives (`kitaru.llm()`, `kitaru.wait()`, `kitaru.memory`)	Yes	Not supported
Built-in LLM primitive with alias-resolved secrets and per-call token/latency logging	Yes	Not supported
Typed, versioned artifact lineage per checkpoint (cross-run diff)	Yes	Not supported
Opinionated stack abstraction for Kubernetes, AWS, GCP, Azure	Yes	Not supported
Embedded library (no separate server process in the request path)	Yes	Not supported
Polyglot handler SDKs (TypeScript, Java, Kotlin, Go, Rust, Python)	Not supported	Yes
Virtual Objects: keyed per-entity state with strongly-consistent access	Not supported	Yes
Awakeables and durable promises as a primitive	Not supported	Yes

How the two surfaces map

Concept	Restate	Kitaru
Durable unit	Handler (Service / Virtual Object / Workflow)	`@flow` + `@checkpoint`
Durability mechanism	Journaled event log	Versioned artifacts per checkpoint
Per-entity state	Virtual Object (keyed, strongly-consistent)	`kitaru.memory` scopes + artifact store
Human-in-the-loop	Awakeable / durable promise	`kitaru.wait()`
Invocation	Handler call via `restate-server`	`flow.run()`, `kitaru invoke`, CLI, SDK, MCP, curl
Deployment model	Self-hosted `restate-server` + registered handlers	Stack abstraction for Kubernetes, AWS, GCP, Azure
LLM call	Your code inside a journaled step	`kitaru.llm()` with alias, key injection, token/latency logging

Code comparison

Kitaru Recommended

import kitaru
from kitaru import checkpoint, flow

@checkpoint
def research(topic: str) -> str:
  return kitaru.llm(
      prompt=f"Research: {topic}. Return a brief.",
      model="fast",
  )

@checkpoint
def draft(brief: str) -> str:
  return kitaru.llm(
      prompt=f"Write a draft from this brief:\n{brief}",
      model="fast",
  )

@flow
def review_flow(topic: str) -> str:
  brief = research(topic)
  text = draft(brief)
  approved = kitaru.wait(
      name="approve_draft",
      question="Approve draft?",
      schema=bool,
  )
  return text if approved else "Rejected"

review_flow.run("Durable agents")

Restate (Python SDK)

import restate

review = restate.Workflow("ReviewWorkflow")

@review.main()
async def run(ctx: restate.WorkflowContext, topic: str) -> str:
  # Journaled step. Non-deterministic work lives inside ctx.run().
  brief = await ctx.run(
      "research",
      lambda: call_llm(f"Research: {topic}"),
  )
  text = await ctx.run(
      "draft",
      lambda: call_llm(f"Draft: {brief}"),
  )

  # Awakeable: durable promise resolved by an external caller.
  approval_id, approval = ctx.awakeable()
  send_approval_link(approval_id, text)
  approved = await approval
  return text if approved else "Rejected"

app = restate.app([review])

# Register the workflow with restate-server, deploy the handler,
# and resolve the awakeable from wherever approvals come in.

Durable execution, shaped for your Python agent

If your durability problem is shaped like a polyglot backend with keyed per-entity state, Restate is the right tool and I’d tell any team that. For Python agent work (LLM calls, memory, artifacts, human/agent approval gates), the glue code you’d write on top of a general-purpose durable runtime is the stuff we ship in the box.

We’ve spent five years building the MLOps-ready version of this problem space at ZenML. Kitaru is that team two years into the agent version. Bet on us for agent infrastructure and you’re betting on the group that’s been doing this the whole time.

pip install kitaru

Book a demo