Skip to content

Agent Harness

Problem Statement

How might we give a solo OSS maintainer an always-on AI co-maintainer that handles triage, dependency updates, and releases across multiple repos — composable from specialized agents, runnable anywhere, without writing integration glue each time?

Build a minimal Python harness with a well-defined agent protocol, framed from day one as an AI co-maintainer with its own GitHub identity and opinions.

The harness is deliberately thin: it handles process lifecycle, credential injection, message routing, scheduling, and nothing else. All intelligence lives in agents. Agents communicate with the harness over a simple wire protocol — inspired by LSP's architecture: small core, large ecosystem.

The co-maintainer framing matters architecturally. Each agent isn't a "script" — it's a worker with a defined role, a persona, and a contract. This shapes how agents are configured, how they report decisions, and how the user overrides them. It also makes the system feel coherent rather than a bag of automations.

LLM communication is an injectable dependency. The same agent runs against OpenAI, Claude, or a local Ollama instance — the harness doesn't care. This makes local-first deployment a configuration choice, not a redesign.

The Archon-style workflow engine is a runtime plugin — one minion among many — not the core. Start without it. Add it when multi-step coordination across agents is needed.

Key Assumptions to Validate

  • A simple message protocol (JSON over stdin/stdout or unix socket) is sufficient for agent↔harness communication — test by sketching protocol messages for 3 different agent types before writing code
  • LLM abstraction (e.g., LiteLLM) doesn't impose unacceptable latency or capability loss for triage/release tasks — test by running a triage prompt against 2-3 backends with the same interface
  • A solo maintainer can get from install to first useful action (an issue triaged) in under 30 minutes — test against a real repo during early development
  • GitHub webhooks + a lightweight scheduler covers the triggering needs without a heavy event bus — test before adding infrastructure

MVP Scope

One harness, one agent, one repo.

In:

  • Harness core: process management, credential injection, message routing, webhook listener
  • Agent protocol v0: JSON message contract (task in, decision out, action taken)
  • One agent: Issue Triage (label, respond, close stale)
  • LLM backend abstraction with at least two providers (e.g., Anthropic + Ollama)
  • Config file to define repos, agent assignments, and LLM backend
  • GitHub identity for the co-maintainer (dedicated bot account)

Out of MVP, but the protocol must allow:

  • Multi-agent workflows / DAG engine
  • Dependency update and release agents
  • Plugin/sharing registry
  • Web UI

Not Doing (and Why)

  • DAG workflow engine in v1 — adds complexity before the protocol is proven; the Archon integration is a plugin, not the foundation
  • Community marketplace — needs trust/sandboxing infrastructure; defer until the protocol is stable, and there are at least 3 first-party agents to show the pattern
  • Web UI — CLI + config file is sufficient for a solo maintainer; UI is scope inflation at this stage
  • Multi-model routing/fallback logic — LLM abstraction means swapping backends is a config change; smart routing is a future optimization
  • Self-hosted infra automation — assume the user handles deployment; Harness is just a Python process

Open Questions

  • What is the minimal viable agent protocol? (stdin/stdout JSON vs. unix socket vs. HTTP — pick one and commit)
  • Does the co-maintainer need a persistent memory store in v1, or is per-run context sufficient for triage?
  • What's the right config format — TOML, YAML, Python DSL? (affects how composable workflows feel before the DAG engine exists)