GitHub - weakincentives/weakincentives: Tools for developing and optimizing background agents.

Weak Incentives (Is All You Need)

WINK is the agent-definition layer for building unattended/background agents. You define the prompt, tools, policies, and feedback that stay stable while runtimes change. The planning loop, sandboxing, retries, and orchestration live in the execution harness—often a vendor runtime. WINK keeps your agent definition portable.

New to WINK? Read the WINK Guide for a comprehensive introduction—philosophy, quickstart, and practical patterns for building agents.

Definition vs. Harness

A high-quality unattended agent has two parts:

Agent definition (you own):

Prompt structure (context engineering)
Tools + typed I/O contracts (the side-effect boundary)
Policies (gates on tool use and state transitions)
Feedback ("done" criteria, drift detection)

Execution harness (runtime-owned):

Planning/act loop and tool-call sequencing
Sandboxing/permissions (filesystem/shell/network)
Retries/backoff, throttling, and lifecycle management
Scheduling, budgets/deadlines, crash recovery
Multi-agent orchestration (when used)

The harness will keep changing—and increasingly comes from vendor runtimes—but your agent definition should not. WINK makes the definition a first-class artifact you can version, review, test, and port across runtimes via adapters.

The Prompt is the Agent

Most agent frameworks treat prompts as an afterthought—templates glued to separately registered tool lists. WINK inverts this: the prompt is the agent. You define an agent as a single hierarchical document where each section bundles its own instructions and tools together.

PromptTemplate[ReviewResponse]
├── MarkdownSection (guidance)
├── WorkspaceDigestSection     ← auto-generated codebase summary
├── MarkdownSection (reference docs, progressive disclosure)
├── PlanningToolsSection       ← contributes planning_* tools
│   └── (nested planning docs)
├── VfsToolsSection            ← contributes ls/read_file/write_file/...
│   └── (nested filesystem docs)
└── MarkdownSection (user request)

Each section can render instructions, contribute tools, nest child sections, and enable or disable itself based on runtime state. When a section disables, its entire subtree—tools included—vanishes from the prompt.

The result: the prompt fully determines what the agent can think and do. There's no separate tool registry to synchronize, no routing layer to maintain, no configuration that can drift from documentation. You define the agent's capabilities once, in one place, and the definition ports across runtimes.

Why this matters:

Co-location. Instructions and tools live together. The section that explains filesystem navigation is the same section that provides the read_file tool. Documentation can't drift from implementation.
Progressive disclosure. Nest child sections to reveal advanced capabilities only when relevant. The LLM sees numbered, hierarchical headings that mirror your code structure.
Dynamic scoping. Each section has an enabled predicate. Disable a section and its entire subtree—tools included—disappears from the prompt. Swap in a PodmanSandboxSection instead of VfsToolsSection when a shell is available; the prompt adapts automatically.
Typed all the way down. Sections are parameterized with dataclasses. Placeholders are validated at construction time. Tools declare typed params and results. The framework catches mismatches before the request reaches an LLM.

Key Capabilities

Prompts

Typed sections. Build prompts from composable Section objects that bundle instructions and tools together.
Hash-based overrides. Prompt descriptors carry content hashes so overrides apply only to the intended version. Teams iterate on prompts via version-controlled JSON without risking stale edits. See Prompt Optimization.

Tools

Transactional execution. Tool calls are atomic transactions. When a tool fails, WINK automatically rolls back session state and filesystem changes to their pre-call state. Failed tools don't leave traces in mutable state.
Sandboxed virtual filesystem. Agents get an in-memory VFS tracked as session state. Mount host directories read-only when needed; the sandbox prevents accidental writes to the host. See Workspace Tools.

Policies

Invariants over workflows. Gate tool calls with explicit policies instead of brittle orchestration graphs. Encode constraints like "don't write before you've read" or "don't call tool B until tool A ran." See Policies Over Workflows.

Feedback

Completion resistance. Encode "done means X" checks that run during execution to catch drift and premature termination. See Task Completion Checking and Trajectory Observers (design spec).

State and Adapters

Event-driven state. Every state change flows through pure reducers that process published events. State is immutable and inspectable—you can snapshot at any point. See Session State.
Harness-swappable adapters. Keep the agent definition stable while switching runtimes (OpenAI, LiteLLM, Claude Agent SDK). The Claude Agent SDK adapter is an example of "renting the harness": native tools + OS-level sandboxing, while WINK supplies the definition. See Adapters.

Getting Started

Requirements: Python 3.12+, uv

uv add weakincentives
# optional extras
uv add "weakincentives[openai]"           # OpenAI adapter
uv add "weakincentives[litellm]"          # LiteLLM adapter
uv add "weakincentives[claude-agent-sdk]" # Claude Agent SDK adapter
uv add "weakincentives[podman]"           # Podman sandbox
uv add "weakincentives[wink]"             # debug UI

Debug UI

uv run --extra wink wink debug snapshots/session.jsonl --port 8000

Tutorial: Code Review Agent

Build a code review assistant with structured output, sandboxed file access, and observable state. Full source: code_reviewer_example.py

1. Define structured output

from dataclasses import dataclass

@dataclass(slots=True, frozen=True)
class ReviewResponse:
    summary: str
    issues: list[str]
    next_steps: list[str]

2. Compose the prompt

from weakincentives.prompt import MarkdownSection, Prompt, PromptTemplate
from weakincentives.contrib.tools import PlanningToolsSection, VfsToolsSection, WorkspaceDigestSection

template = PromptTemplate[ReviewResponse](
    ns="examples/code-review",
    key="code-review-session",
    name="code_review_agent",
    sections=(
        MarkdownSection(...),                          # guidance
        WorkspaceDigestSection(session=session),       # auto-generated summary
        PlanningToolsSection(session=session),         # planning tools
        VfsToolsSection(session=session, mounts=...),  # sandboxed files
        MarkdownSection[ReviewTurnParams](...),        # user input
    ),
)

prompt = Prompt(template).bind(ReviewTurnParams(request="Review main.py"))

3. Mount files safely

from weakincentives.contrib.tools import HostMount, VfsPath, VfsToolsSection

mounts = (
    HostMount(
        host_path="repo",
        mount_path=VfsPath(("repo",)),
        include_glob=("*.py", "*.md", "*.toml"),
        exclude_glob=("**/*.pickle",),
        max_bytes=600_000,
    ),
)
vfs_section = VfsToolsSection(session=session, mounts=mounts, allowed_host_roots=(SAFE_ROOT,))

4. Run and get typed results

from dataclasses import dataclass
from typing import Any
from weakincentives.runtime import MainLoop, Session
from weakincentives.runtime.events import InProcessDispatcher
from weakincentives.adapters.openai import OpenAIAdapter
from weakincentives.prompt import Prompt, PromptTemplate

# Type stubs for example (defined in your application)
@dataclass(frozen=True)
class ReviewTurnParams:
    request: str

@dataclass(frozen=True)
class ReviewResponse:
    summary: str

def build_task_prompt(*, session: Session) -> PromptTemplate[ReviewResponse]:  # type: ignore[type-arg]
    ...  # type: ignore[empty-body]

class ReviewLoop(MainLoop[ReviewTurnParams, ReviewResponse]):
    def __init__(self, adapter: Any, dispatcher: Any) -> None:
        super().__init__(adapter=adapter, dispatcher=dispatcher)
        self._session = Session(dispatcher=dispatcher)
        self._template = build_task_prompt(session=self._session)

    def prepare(self, request: ReviewTurnParams) -> tuple[Prompt[ReviewResponse], Session]:
        return Prompt(self._template).bind(request), self._session

dispatcher = InProcessDispatcher()
loop = ReviewLoop(OpenAIAdapter(model="gpt-4o"), dispatcher)
response, _ = loop.execute(ReviewTurnParams(request="Find bugs in main.py"))
if response.output is not None:
    review: ReviewResponse = response.output  # typed, validated

5. Inspect state

from weakincentives.contrib.tools.planning import Plan

plan = session[Plan].latest()
if plan:
    for step in plan.steps:
        print(f"[{step.status}] {step.title}")

6. Iterate prompts without code changes

from weakincentives.prompt.overrides import LocalPromptOverridesStore

prompt = Prompt(
    template,
    overrides_store=LocalPromptOverridesStore(),
    overrides_tag="assertive-feedback",
).bind(ReviewTurnParams(request="..."))

Overrides live in .weakincentives/prompts/overrides/ and match by namespace, key, and tag.

Renting the Harness: Claude Agent SDK

This is the "rent the harness" path: Claude's runtime drives the agent loop and native tools; WINK provides the portable agent definition and bridges custom tools where needed.

python code_reviewer_example.py --claude-agent

Key differences:

Native tools: Uses Claude Code's built-in tools instead of VFS
Hermetic isolation: Ephemeral home directory prevents access to host config
Network policy: Restricted to specific documentation domains
MCP bridging: Custom WINK tools bridged via MCP
Sandbox: OS-level sandboxing (bubblewrap on Linux, seatbelt on macOS)

from weakincentives.adapters.claude_agent_sdk import (
    ClaudeAgentSDKAdapter,
    ClaudeAgentSDKClientConfig,
    ClaudeAgentWorkspaceSection,
    HostMount,
    IsolationConfig,
    NetworkPolicy,
    SandboxConfig,
)

workspace = ClaudeAgentWorkspaceSection(
    session=session,
    mounts=(HostMount(host_path="src", mount_path="src"),),
    allowed_host_roots=("/path/to/project",),
)

adapter = ClaudeAgentSDKAdapter(
    model="claude-sonnet-4-5-20250929",
    client_config=ClaudeAgentSDKClientConfig(
        permission_mode="bypassPermissions",
        cwd=str(workspace.temp_dir),
        isolation=IsolationConfig(
            network_policy=NetworkPolicy(allowed_domains=("docs.python.org",)),
            sandbox=SandboxConfig(enabled=True),
        ),
    ),
)

response = adapter.evaluate(prompt, session=session)
workspace.cleanup()

See Claude Agent SDK Adapter for full configuration.

Development

uv sync && ./install-hooks.sh

Key targets:

make format / make lint / make typecheck
make test (100% coverage enforced)
make check (all of the above plus Bandit, Deptry, pip-audit)

Quality gates:

Pyright strict mode enforced
Design-by-contract decorators (@require, @ensure, @invariant)
100% test coverage required
Security scanning on every build

export OPENAI_API_KEY="sk-..."
make integration-tests

Documentation

AGENTS.md — contributor workflow
llms.md — agent-friendly API overview (also the PyPI README)
specs/ — design documents
ROADMAP.md — upcoming features

License

Apache 2.0 • Status: Alpha (APIs may change)