Overview

A runnable reference architecture for building an internal agent harness platform with Kitaru and PydanticAI

When one team builds one agent, the hard parts can hide in a notebook or a script. When several teams start building agents, the same platform questions come back every time:

If an agent crashes halfway through a task, can it resume without paying for the same model call again?
Where do logs live when someone needs to debug what happened yesterday?
How does the agent run shell commands without touching the developer's laptop or the production host?
How do tools call internal services without handing raw credentials to the model or the worker process?
How do you stop an agent before it takes a risky action, wait for a human answer, and then continue from the same point?
How do different teams get different tools, prompts, services, and safety rules without copying a pile of glue code?

This example is about those repeated platform problems. It shows one small internal agent harness platform where the shared machinery is reusable, and each individual agent is mostly configuration.

What "internal agent harness platform" means here

In this example, an internal agent harness platform is not a place where agents magically appear. It is a small platform layer that takes an agent profile and produces a runnable agent with the same safety and operations rails every time.

Concretely, a team describes an agent with a Profile: its name, model, system prompt, allowed tools, allowed services, skill files, sandbox rules, and approval points. The platform code reads that profile and builds the agent around it. Kitaru supplies durable execution, so completed work can be reused after a retry. PydanticAI supplies the agent runtime. The example library supplies the platform seams around tools, sandboxes, credentials, typed service calls, and human approval.

The goal is that Team A can build a support-triage agent and Team B can build a release-notes agent without both teams re-solving durability, logs, secrets, approvals, and safe command execution from scratch.

What this example is, and what it is not

This is a runnable local reference architecture. You can clone the repo, run the stages, and see each platform pattern in a concrete script. The stages are deliberately small: each one adds one new capability while keeping the older stages valid.

This also shows forkable platform seams. The credential proxy, typed service boundary, profile gates, sandbox, and human-in-the-loop pause are all shown as separate pieces because those are the pieces you would usually want to harden, replace, or connect to your own infrastructure.

This is not a turnkey enterprise platform. It does not ship your identity provider, policy engine, observability stack, deployment system, or production secret store. It is also not a hostile-code security boundary. The sandbox pattern is useful for local isolation and for showing where command execution belongs, but running untrusted code safely requires much more infrastructure and review than this example includes. For the full inventory of which pieces are teaching stand-ins and what to harden first, see Production notes and upgrade paths.

If you only want to make one function durable, start with the Quickstart. Come back here when you want to see how the same Kitaru primitives fit into a larger internal platform shape.

Architecture at a glance

Agent Profileone per agent · everything below is mostly configuration

model + system prompt
allowed_tools — which capabilities are on
allowed_services — which typed calls are reachable
skill files — the editable procedure
proxy rules — which hosts get which credential
approval points — where the agent pauses for a human

configures

Agent Harness Platform libraryreusable rails — built once, shared by every team

builds a PydanticAI agent from the profile
wraps it in a Kitaru durable flow

produces one durable agent with four capabilities

Kitaru durable flowStage 1 · completed work survives a crash

execStage 2 · shell in a Docker sandbox

egress credentials added by the proxy — Stage 4

skillStage 3 · operator-editable markdown on the host

exec_serviceStage 5 · typed host-side handlers

ask_questionStage 6 · kitaru.wait() durable human pause

Kitaru durability (Stage 1)Sandboxed shell executionTyped host-side service callsHuman-in-the-loop pauseHost-side files / config

A platform team defines the rails once. Product teams mostly change the Profile — which tools, services, skills, and approval points an agent gets. Each capability is one stage of this tour.

The six patterns below build up this picture one capability at a time, in the order a platform team usually adds them.

git clone https://github.com/zenml-io/kitaru.git
cd kitaru/examples/end_to_end/agent_harness_platform
uv sync
uv run kitaru init
export OPENAI_API_KEY=sk-...
uv run python stage_1_basic_agent.py

The full source lives in examples/end_to_end/agent_harness_platform/ on GitHub. It includes the runnable stage files, the reusable agent_harness_platform/ library, mocks, skills, and Dockerfiles.

What you will have at the end

By the end of the tour, you will have seen how to assemble:

one Profile per agent, with clear gates for tools, services, skills, and approval points;
a reusable agent_harness_platform/ library that turns a profile into a durable PydanticAI agent;
a sandboxed command path for shell work;
a credential proxy path for service calls that need secrets;
typed service handlers for approved internal actions; and
durable human-in-the-loop pauses for decisions that should not be left to the model.

The demo prompts are intentionally generic. The point is not the toy task each stage performs. The point is the shape of the platform: shared rails once, many profile-driven agents on top.

What "internal agent harness platform" means here

What this example is, and what it is not

Architecture at a glance

The six platform patterns

Durable agent execution

Sandboxed command execution

Operator-editable procedures

Credential isolation

Typed service boundaries

Durable human approval

Get the code

What you will have at the end

On this page