Foreman , Project Specification¶
Status: In progress , scaffolding complete, implementation underway
1. Objective¶
Build a minimal Python harness that acts as an always-on AI co-maintainer for OSS repositories. The harness manages process lifecycle, credential injection, message routing, and scheduling. All intelligence lives in agents. The system must be composable from specialized agents and runnable anywhere without writing integration glue each time.
Target user: A solo OSS maintainer who wants automated triage, dependency updates, and releases across multiple repos, with a co-maintainer who has their own GitHub identity and opinions.
Success criterion for MVP: A maintainer can install the harness, configure one repo, and have an issue triaged (labeled, responded to, or closed) without writing any code — in under 30 minutes.
2. MVP Scope¶
In¶
| Component | Description |
|---|---|
| Harness core | Process management, credential injection, HTTP message routing, GitHub event polling |
| Agent protocol v0 | JSON over HTTP , task in, decision out, action taken |
| Issue Triage agent | Label, respond, close stale issues |
| LLM backend abstraction | At least Anthropic and Ollama backends via a common interface (LiteLLM or thin adapter) |
| Config file | YAML , defines repos, agent assignments, LLM backend |
| Persistent action memory | Per-repo summary store of past agent decisions to inform future ones |
| GitHub bot identity | Dedicated bot account; harness authenticates as the bot |
| Polling trigger | GitHub API polling as primary trigger (no public URL required) |
Out of MVP (protocol must not block these)¶
- Multi-agent DAG workflow engine
- Dependency update agent
- Release agent
- Plugin/sharing registry
- Web UI
- Webhook listener (defer until public URL is a supported deployment mode)
- Multi-model routing/fallback logic
3. Agent Protocol v0¶
Transport¶
HTTP (JSON over REST). The harness runs an internal HTTP server. Each agent runs in a Docker container and exposes an HTTP endpoint. The harness POSTs tasks to agents; agents return decisions synchronously.
Message Contract¶
Task (harness → agent):
{
"task_id": "uuid4",
"type": "issue.triage",
"repo": "owner/repo",
"payload": { /* GitHub event payload */ },
"context": {
"memory_summary": "string , prior actions on this issue/repo",
"llm_backend": { "provider": "anthropic", "model": "claude-sonnet-4-6" }
}
}
Decision (agent → harness):
{
"task_id": "uuid4",
"decision": "label_and_respond | close | escalate | skip",
"rationale": "string",
"actions": [
{ "type": "add_label", "label": "bug" },
{ "type": "comment", "body": "string" }
]
}
Constraint: The harness executes all GitHub API calls. Agents produce decisions and action lists , they never call GitHub directly. This keeps credentials out of agent containers.
4. Project Structure¶
foreman/
├── foreman/
│ ├── __init__.py
│ ├── config.py # YAML config loader and validator (runtime repos/agents/LLM config)
│ ├── settings.py # App-level operational settings via pydantic-settings (env vars)
│ ├── server.py # FastAPI app , dispatch loop: fetch memory → build task → POST to agent → execute
│ ├── poller.py # GitHub API polling loop (asyncio + semaphore)
│ ├── executor.py # Execute actions returned by agents (GitHub API calls)
│ ├── memory.py # Persistent action memory (SQLite , action_log + memory_summary)
│ ├── credentials.py # Credential injection and secret management (${VAR} resolution)
│ ├── protocol.py # Pydantic models: TaskMessage, DecisionMessage, ActionItem
│ ├── containers.py # Docker container lifecycle manager for agent containers
│ ├── logging_info.py # structlog configuration (dev: console, prod: JSON)
│ ├── middleware.py # Correlation ID middleware for request tracing
│ ├── otel.py # OpenTelemetry configuration and FastAPI instrumentation
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── health.py # GET /healthcheck and GET /healthcheck/ready endpoints
│ │ └── agent.py # Event → agent routing logic (repo+event_type → RouteTarget)
│ └── llm/
│ ├── __init__.py
│ ├── base.py # Abstract LLMBackend ABC + from_config() factory
│ ├── anthropic.py # Anthropic backend (wraps LiteLLM)
│ └── ollama.py # Ollama backend (wraps LiteLLM)
├── agents/
│ └── issue-triage/
│ ├── Dockerfile
│ ├── agent.py # FastAPI: POST /task, GET /health
│ ├── prompts/
│ │ └── triage.py
│ └── pyproject.toml
├── tests/
│ ├── fixtures/ # Recorded LLM response fixtures (replayed in CI)
│ ├── test_config.py
│ ├── test_router.py
│ ├── test_executor.py
│ ├── test_memory.py
│ ├── test_protocol.py
│ ├── test_poller.py
│ ├── test_containers.py
│ ├── test_server.py
│ ├── test_main.py
│ ├── test_logging_info.py
│ ├── test_middleware.py
│ ├── test_otel.py
│ └── test_agent_triage.py
├── config.example.yaml
├── pyproject.toml
├── SPEC.md
└── CHANGELOG.md
Scaffolded vs. planned¶
The following modules exist in the scaffolding and are ready for Foreman-specific implementation:
| Module | State | Notes |
|---|---|---|
server.py |
Scaffolded (template) | Generic FastAPI app with CORS, GZip, middleware; needs dispatch loop added |
settings.py |
Scaffolded | pydantic-settings env-var loading for operational settings (log level, OTEL); distinct from YAML config |
logging_info.py |
Scaffolded | structlog setup with dev/prod renderers and OTel trace injection |
middleware.py |
Scaffolded | Correlation ID bound to each request's log context |
otel.py |
Scaffolded | OpenTelemetry with optional OTLP exporter and debug console exporter |
routers/health.py |
Scaffolded | /healthcheck and /healthcheck/ready endpoints |
| All other modules | Not yet created | Per the implementation plan |
Distinction between settings.py and config.py:
settings.py, operational settings loaded from environment variables at startup (log level, OTEL connection, environment name). Usespydantic-settings. No YAML.config.py, runtime harness config loaded fromconfig.yaml(repos, agents, LLM backend, polling interval). Uses PyYAML + Pydantic. Secrets resolved via${VAR}substitution.
5. Configuration (YAML)¶
# config.yaml
identity:
github_token: "${GITHUB_TOKEN}" # bot account PAT
github_user: "my-bot"
llm:
provider: anthropic # anthropic | ollama
model: claude-sonnet-4-6
api_key: "${ANTHROPIC_API_KEY}" # omit for ollama
polling:
interval_seconds: 60
repos:
- owner: my-org
name: my-repo
agents:
- type: issue-triage
config:
stale_days: 30
labels:
bug: ["crash", "exception", "error"]
question: ["how do I", "how to"]
All secrets are environment variables (${VAR}).
The config file itself never contains raw secrets.
6. Persistent Memory¶
Storage: SQLite, local file (~/.agent-harness/memory.db by default, overridable in config).
Schema:
CREATE TABLE action_log (
id INTEGER PRIMARY KEY,
repo TEXT NOT NULL, -- "owner/repo"
issue_id INTEGER NOT NULL,
task_type TEXT NOT NULL,
decision TEXT NOT NULL,
rationale TEXT,
actions TEXT, -- JSON array
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE memory_summary (
repo TEXT NOT NULL,
issue_id INTEGER NOT NULL,
summary TEXT NOT NULL, -- LLM-generated summary of prior actions
updated_at DATETIME,
PRIMARY KEY (repo, issue_id)
);
The memory_summary table is updated after each action.
When dispatching a new task, the harness fetches the relevant summary and injects it into the task context.
7. Tech Stack¶
| Concern | Choice |
|---|---|
| Language | Python 3.12+ |
| Build system | Hatchling |
| Package manager | uv |
| HTTP (harness server) | FastAPI |
| HTTP (agent client) | httpx |
| GitHub API | PyGithub or raw httpx calls |
| LLM abstraction | LiteLLM (validate with triage prompt against both backends) |
| Agent packaging | Docker containers |
| YAML config parsing | PyYAML + Pydantic for validation |
| App/env settings | pydantic-settings (env var loading for operational config) |
| Memory store | SQLite via stdlib sqlite3 |
| Structured logging | structlog (dev: console, prod: JSON; with OTel trace injection) |
| Observability | OpenTelemetry (FastAPI instrumentation; optional OTLP export) |
| Container management | Docker SDK for Python (docker package) |
| Versioning | bump-my-version |
8. Code Style¶
Matches the project's established toolchain:
- Formatter/linter: ruff (line length 119, Google docstring convention)
- Type checking: mypy (
--no-strict-optional --ignore-missing-imports) - Docstring coverage: interrogate (≥90%, Google style, validated by pydoclint)
- Pre-commit hooks: ruff-format, ruff-check, mypy, pydoclint, interrogate, detect-secrets, pyupgrade, check-yaml, check-toml, check-github-actions, check-github-workflows
- Type hints: Required on all public functions and methods;
--keep-runtime-typing - Python minimum: 3.12
9. Testing Strategy¶
- Framework: pytest + pytest-cov (branch coverage)
- LLM calls: Recorded fixtures , real LLM responses captured once and replayed.
No live LLM calls in CI.
Fixture files live in
tests/fixtures/. - GitHub API calls: pytest-mock to mock PyGithub/httpx calls at the boundary
- Agent protocol: Integration tests that spin up the agent container locally and send real HTTP task messages; record expected response fixtures
- Coverage target: ≥85% line coverage, ≥80% branch coverage
- What is NOT mocked: SQLite memory store (use a real in-memory or temp-file DB in tests)
10. Boundaries¶
Always do automatically¶
- Poll configured repos on the set interval without user intervention
- Inject credentials from the environment; never log or expose them
- Write every decision and action to the action log before executing
- Validate config file on startup; fail fast with a clear error if invalid
Ask before doing (require explicit config opt-in)¶
- Closing an issue (default: label + comment only; closing requires
allow_close: truein agent config) - Acting on issues created by repo owners or maintainers (default: skip)
- Posting more than one comment on the same issue within 24 hours
Never do¶
- Call GitHub API as anything other than the configured bot identity
- Store raw secrets in config files, logs, or the memory database
- Execute shell commands or arbitrary code from agent decision payloads
- Implement a DAG workflow engine in v1
- Build a web UI in v1
- Assume a public-facing URL exists for webhook delivery
11. Open Questions (resolved)¶
| Question | Decision |
|---|---|
| Wire protocol | HTTP (JSON over REST) |
| Persistent memory in v1? | Yes , per-issue action summary injected into task context |
| Config format | YAML |
| Agent packaging | Docker containers |
| Webhook vs. polling | Polling only in v1; no public URL assumed |
| LLM test strategy | Recorded fixtures |
| Code style | ruff + mypy + pydoclint + interrogate (existing toolchain) |
12. Known Scaffolding Issues to Fix¶
The following issues were identified in the boilerplate scaffolding and must be fixed before or during implementation:
pyproject.tomlmissing CLI entry point: The[project.scripts]section is commented out. Uncomment and point to the Foreman CLI onceforeman/__main__.pyis implemented.pyproject.tomlmissing runtime dependencies:PyYAML,PyGithub,litellm,httpx,docker(Python SDK) are not yet listed. Add these during the relevant implementation tasks.server.pyis a template, not Foreman's dispatch loop: The scaffoldedserver.pyis a generic FastAPI app. It must be extended to implement the task dispatch loop described in §3 while retaining the existing middleware/CORS/logging setup.
13. Key Assumptions to Validate Before Coding¶
- HTTP round-trip latency between harness and agent container (localhost) is acceptable for a polling-based system: sketch one full triage flow end-to-end before building the full protocol
- LiteLLM abstraction works transparently for both Anthropic and Ollama without capability gaps on triage prompts: test same prompt against both backends before committing
- SQLite is sufficient for memory at single-maintainer scale: revisit only if concurrent multi-repo polling causes locking issues
- Docker is available in the target deployment environment: document as a hard prerequisite