Restate is a serious durable runtime. Polyglot SDKs across TypeScript, Java, Kotlin, Go, Rust, and Python. BSL 1.1 runtime license, with Apache 2.0 as the change license after the change date. Three handler shapes (services, Virtual Objects with keyed per-entity state and strongly-consistent access, and workflows), journaled execution, awakeables for human-in-the-loop, OpenTelemetry tracing, self-hostable on every major cloud. If your durability problem is polyglot and backend-services-shaped, Restate is the pragmatic answer. That’s the honest thing to say up front.
Kitaru is narrower on purpose. We built it for Python agents, because that’s the workload most teams we talk to are trying to ship right now. Today, it’s increasingly obvious that every company running agents in production needs the same capabilities: a durable llm() call, wait/resume, typed versioned memory, artifact lineage across runs, human/agent approval gates, and a runtime that understands the cloud underneath it. Those don’t come in the box with any general-purpose durable runtime. So we put them there.
Use Kitaru if you are
- Running Python agents and want `kitaru.llm()`, `kitaru.wait()`, `kitaru.memory`, and artifact lineage as primitives, not glue code your platform team maintains forever
- Deploying across Kubernetes, AWS, GCP, or Azure and want an opinionated stack abstraction where one config switches every flow's backend
- Happy with embedded durability (`@flow` and `@checkpoint` on ordinary Python) rather than handlers that register with a separate server
- Designing around dynamic agent loops: tool calls, conditional branches, human/agent approval gates, hours-long waits
Use Restate if you are
- Running a polyglot services stack (TypeScript, Java, Kotlin, Go, Rust, Python) that needs one durability primitive across all of it
- Doing durable RPC, sagas, or long-running backend workflows, not agent loops with LLM calls
- Modeling keyed per-entity state with strongly-consistent access (chat sessions, orders, devices, users) as a first-class abstraction
- Fine with running `restate-server` as infra and using awakeables / durable promises as the core HITL primitive
Restate adds durability to your services. Kitaru adds durability to your Python agents.
Agent-shaped vs app-shaped runtime
Defaults tell you what a tool is actually for. Restate’s defaults are services, Virtual Objects, and workflows. Real durability primitives, shaped for backend app workloads. Kitaru’s defaults are kitaru.llm(), kitaru.wait(), kitaru.memory, and versioned artifacts per checkpoint. Different opinion about the workload, same commitment to durability.
- In-the-box primitives:
kitaru.llm()resolves the model alias, injects the provider key, and logs prompt, response, latency, tokens, and model on every call. Restate’s equivalent surface is services, Virtual Objects, workflows, and awakeables. Durability primitives, not agent primitives. - Framework adapters: Kitaru ships a PydanticAI adapter, so wrapping an existing agent harness in
@flowand@checkpointis a five-minute job. Restate works alongside agent SDKs via generic wrappers. - Both can be bent to the other’s shape. What ships in the box tells you what the team was watching when they set the defaults.
Artifact lineage vs journal and Virtual Object state
Both tools persist state. The data models are different, and they’re optimized for different shapes of work. Worth saying clearly.
- Kitaru: Every
@checkpointoutput lands as a typed, versioned artifact in your own S3, GCS, or Azure Blob bucket, linked to the execution record. Pick two runs, diff the artifacts at each checkpoint in the same UI. We took this straight from ZenML, where five years of ML work taught us that platform teams live or die on artifact lineage. - Restate: Handler progress is journaled for deterministic replay. Virtual Objects expose keyed per-entity state with strongly-consistent access and single-writer semantics per object. That’s the right shape for “the chat session”, “the order”, “the device”.
- Honest read: VO-keyed per-entity state and versioned artifacts aren’t two answers to the same question. They’re two shapes of state, fit for different workloads.
Opinionated cloud-stack abstraction vs bring-your-own deployment
We spent five years at ZenML watching ML teams re-solve cloud topology for every new pipeline. Artifacts, secrets, compute, IAM: every cloud has its own version. JetBrains runs their AI globally on ZenML specifically to avoid re-solving this for every team. Same story is playing out for agent teams now. Kitaru’s answer is to ship the stack abstraction in the runtime.
restate-server self-hosts on any of these; handlers deploy however you already deploy backend services. The runtime does not abstract the cloud backend for you. - Kitaru: Configure a stack once (AWS, GCP, Azure, Kubernetes). Every flow gets artifacts in your object store, secrets through the cloud’s provider, orchestration on the chosen compute. Swap the stack to switch clouds without changing flow code.
- Restate:
restate-serverself-hosts on any of those clouds. That’s a real feature. The wedge isn’t where Restate can run, it’s that the runtime doesn’t abstract the cloud backend. You deploy handlers however you already deploy backend services. - The trade: Polyglot backend teams often want exactly Restate’s model. Python agent teams usually want the cloud topology handled by the runtime, not by the platform team for the tenth time.
Embedded library vs server-registered handlers
Both runtimes have a server. The wedge is whether your request path has to cross it.
- Kitaru:
pip install kitaru, add@flowand@checkpointto ordinary Python, your flow runs in your process. The Kitaru server stores execution metadata and powers the UI, CLI, and auth. It’s not a proxy every invocation crosses. - Restate: Handlers register with
restate-server, and invocations flow through it. The server journals progress, routes requests, handles retries, and drives replay. That’s not incidental; it’s the durability model. - Selfishly, the Kitaru shape is what we want when we’re shipping an agent. Decorators on ordinary Python, flow runs where our code runs, metadata on the side. If your team wants server-mediated invocation by design, Restate’s shape fits better.
What makes Kitaru unique
| Feature | Kitaru | Restate |
|---|---|---|
| Durable execution and replay | Yes | Yes |
| Human-in-the-loop waiting with compute released | Yes | Yes |
| Self-hostable (Kitaru: Apache 2.0; Restate: BSL 1.1, Apache 2.0 as the change license) | Yes | Yes |
| OpenTelemetry tracing | Yes | Yes |
| Python-agent-shaped primitives (`kitaru.llm()`, `kitaru.wait()`, `kitaru.memory`) | Yes | Not supported |
| Built-in LLM primitive with alias-resolved secrets and per-call token/latency logging | Yes | Not supported |
| Typed, versioned artifact lineage per checkpoint (cross-run diff) | Yes | Not supported |
| Opinionated stack abstraction for Kubernetes, AWS, GCP, Azure | Yes | Not supported |
| Embedded library (no separate server process in the request path) | Yes | Not supported |
| Polyglot handler SDKs (TypeScript, Java, Kotlin, Go, Rust, Python) | Not supported | Yes |
| Virtual Objects: keyed per-entity state with strongly-consistent access | Not supported | Yes |
| Awakeables and durable promises as a primitive | Not supported | Yes |
How the two surfaces map
| Concept | Restate | Kitaru |
|---|---|---|
| Durable unit | Handler (Service / Virtual Object / Workflow) | @flow + @checkpoint |
| Durability mechanism | Journaled event log | Versioned artifacts per checkpoint |
| Per-entity state | Virtual Object (keyed, strongly-consistent) | kitaru.memory scopes + artifact store |
| Human-in-the-loop | Awakeable / durable promise | kitaru.wait() |
| Invocation | Handler call via restate-server | flow.run(), kitaru invoke, CLI, SDK, MCP, curl |
| Deployment model | Self-hosted restate-server + registered handlers | Stack abstraction for Kubernetes, AWS, GCP, Azure |
| LLM call | Your code inside a journaled step | kitaru.llm() with alias, key injection, token/latency logging |
Code comparison
import kitaru
from kitaru import checkpoint, flow
@checkpoint
def research(topic: str) -> str:
return kitaru.llm(
prompt=f"Research: {topic}. Return a brief.",
model="fast",
)
@checkpoint
def draft(brief: str) -> str:
return kitaru.llm(
prompt=f"Write a draft from this brief:\n{brief}",
model="fast",
)
@flow
def review_flow(topic: str) -> str:
brief = research(topic)
text = draft(brief)
approved = kitaru.wait(
name="approve_draft",
question="Approve draft?",
schema=bool,
)
return text if approved else "Rejected"
review_flow.run("Durable agents") import restate
review = restate.Workflow("ReviewWorkflow")
@review.main()
async def run(ctx: restate.WorkflowContext, topic: str) -> str:
# Journaled step. Non-deterministic work lives inside ctx.run().
brief = await ctx.run(
"research",
lambda: call_llm(f"Research: {topic}"),
)
text = await ctx.run(
"draft",
lambda: call_llm(f"Draft: {brief}"),
)
# Awakeable: durable promise resolved by an external caller.
approval_id, approval = ctx.awakeable()
send_approval_link(approval_id, text)
approved = await approval
return text if approved else "Rejected"
app = restate.app([review])
# Register the workflow with restate-server, deploy the handler,
# and resolve the awakeable from wherever approvals come in. Durable execution, shaped for your Python agent
If your durability problem is shaped like a polyglot backend with keyed per-entity state, Restate is the right tool and I’d tell any team that. For Python agent work (LLM calls, memory, artifacts, human/agent approval gates), the glue code you’d write on top of a general-purpose durable runtime is the stuff we ship in the box.
We’ve spent five years building the MLOps-ready version of this problem space at ZenML. Kitaru is that team two years into the agent version. Bet on us for agent infrastructure and you’re betting on the group that’s been doing this the whole time.
pip install kitaru