Token-Efficient Markdown Test Reports¶
Problem Statement¶
LLM-based TDD agents running pytest receive console output filled with ANSI escape codes, Unicode symbols, decorative separators, and verbose formatting optimized for human terminal readability — not token efficiency. This forces agents to spend unnecessary tokens parsing noise, increases API costs, and reduces the signal-to-noise ratio for agents trying to act on test results. There is no standard way to get structured, LLM-optimized test output from pytest without writing custom reporters from scratch.
Goals¶
- Reduce token consumption of pytest output for LLM agents by eliminating ANSI codes, Unicode symbols, and decorative formatting.
- Enable agents to receive test results as clean Markdown via terminal output (replacing default console output) or as a written
.mdfile. - Support both output modes simultaneously so agents and developers can use the tool together.
- Integrate with pytest's existing flag conventions (
-v,--tb) so no new mental model is required. - Provide a zero-config path to useful output with sensible defaults (default filename, default traceback style).
Non-Goals¶
- HTML or JSON report formats — other plugins (pytest-html, pytest-json-report) serve this need; this plugin focuses exclusively on Markdown.
- Custom test runners or test collection changes — this plugin only affects output, not how tests are discovered or executed.
- Real-time streaming output — reports are assembled at session end, not streamed incrementally during the run.
- LLM API integration — the plugin produces Markdown for agents to consume; it does not call any LLM API itself.
- Replacing pytest-cov or other reporting plugins — coverage and other metrics are out of scope for v1.
User Stories¶
LLM TDD Agent¶
- As an LLM TDD agent, I want pytest output in Markdown format so that I can parse test results without wasting tokens on ANSI codes and decorative formatting.
- As an LLM TDD agent, I want failures surfaced first with short tracebacks so that I can identify what to fix without reading through passing tests.
- As an LLM TDD agent, I want a summary line with counts for passed, failed, skipped, and xfail so that I can assess overall test health in a single line.
- As an LLM TDD agent reading a file, I want the report written to a predictable path so that I can read it after the test run without parsing stdout.
Developer / Human Using the Plugin¶
- As a developer, I want to pass
--agent-digest=fileto write a Markdown report without losing normal console output so that I can inspect what an agent will receive while still seeing my usual test feedback. - As a developer, I want to pass
--agent-digest=termto replace console output with Markdown so that I can verify the report format looks correct. - As a developer, I want to use both
--agent-digest=termand--agent-digest=filein the same run so that I can see the Markdown output and capture it. - As a developer, I want to configure the output filename in
pyproject.tomlso that I don't have to pass it on every invocation. - As a developer, I want the plugin to respect
-vand--tbflags so that I don't need to learn a new set of options.
Requirements¶
Must-Have (P0)¶
--agent-digest CLI flag accepting term and/or file values
- --agent-digest=term: replaces pytest's default terminal output with Markdown-formatted output
- --agent-digest=file: writes Markdown report to a file; default console output is unaffected
- Both flags may be used in the same invocation: --agent-digest=term --agent-digest=file
- Acceptance criteria:
- Given --agent-digest=term, when the test session ends, the terminal shows Markdown output and no default pytest console output
- Given --agent-digest=file, when the test session ends, a .md file is written and the default pytest console output is unchanged
- Given both flags, both behaviors occur
Token-efficient Markdown formatting
- No ANSI escape codes in any output
- Status labels use text: PASSED, FAILED, SKIPPED, XFAIL, XPASS — no Unicode symbols
- Metadata fields formatted as **Label:** (colon inside bold)
- Summary counts separated by , (comma-space)
- xfail shown as its own count, separate from skipped
- Acceptance criteria:
- Given any output mode, the Markdown output contains no ANSI sequences
- Given a test summary, xfail count is distinct from skipped count
Report section structure
- ## Failures — appears first; contains FAILED, XFAIL, and XPASS results
- ## Skipped — contains tests marked skip or conditionally skipped
- ## Passes — only present when -v flag is active
- Each failure entry includes: test node ID, short traceback (default --tb=short), and failure reason
- Acceptance criteria:
- Given failures exist, ## Failures is the first section
- Given -v is not set, ## Passes section is omitted
- Given -v is set, ## Passes section lists all passing test node IDs
Summary line
- Session summary formatted as: X passed, X failed, X skipped, X xfailed (omit zero-count categories)
- Acceptance criteria:
- Given 3 passed and 1 failed with no skipped or xfail, summary reads 3 passed, 1 failed
Default filename
- When --agent-digest=file is used without --agent-digest-file, output file is test-results.md in the current working directory
- Acceptance criteria:
- Given --agent-digest=file with no filename option, test-results.md is created in cwd
--agent-digest-file CLI option and llm_report_file ini option
- Sets the path/name of the output file
- Configurable in pyproject.toml under [tool.pytest.ini_options]
- Acceptance criteria:
- Given --agent-digest-file=output/results.md, the report is written to that path
- Given llm_report_file = "output/results.md" in pyproject.toml, the report is written to that path without a CLI flag
Respect -v and --tb flags
- -v controls whether the ## Passes section appears
- --tb=short is the effective default for report tracebacks; --tb=line, --tb=long, --tb=no all work as expected
- Acceptance criteria:
- Given --tb=no, failure entries contain no traceback
- Given --tb=long, failure entries contain the full traceback
- Given -v, passes section is included in the report
Nice-to-Have (P1)¶
Duration metadata per test
- Include elapsed time for each test entry (especially failures) to help agents identify slow tests
- Format: **Duration:** 0.42s
--agent-digest=both shorthand
- Single value that activates both term and file modes without requiring two flags
Warning section
- ## Warnings section capturing pytest warnings for agent visibility
Configurable section inclusion
- llm_report_sections ini option to control which sections appear (e.g., exclude passes even with -v)
Future Considerations (P2)¶
- Machine-readable frontmatter: YAML frontmatter block at top of file with structured summary metadata (total, passed, failed, duration) for agents that parse frontmatter
- Per-test metadata: capture markers, parametrize IDs, and custom metadata attached via
pytest.mark - Multi-run diffing: compare current report against a previous run and surface regressions
Success Metrics¶
Leading indicators (days to weeks post-launch):
| Metric | Target | Method |
|---|---|---|
| Plugin installs | 100 installs within 30 days of first PyPI release | PyPI download stats |
| CI passes on plugin's own test suite | 100% | GitHub Actions |
| Zero open P0 bugs at launch | 0 critical issues | GitHub Issues |
Lagging indicators (weeks to months):
| Metric | Target | Method |
|---|---|---|
| GitHub stars | 50 within 90 days | GitHub |
| Reported token reduction | ≥20% fewer tokens vs. raw pytest output on a representative test suite | Manual benchmark using claudeutils tokens |
| User-reported issues citing format regression | 0 in first 60 days | GitHub Issues |
Open Questions¶
| Question | Owner | Blocking? |
|---|---|---|
Should --agent-digest=term suppress all pytest plugins' terminal output (e.g., pytest-cov summary), or only pytest's own reporter? |
Engineering | Yes — affects plugin hook implementation |
When both term and file are active, are they guaranteed to produce identical content, or can they differ (e.g., file gets more detail)? |
Product | No — can decide during implementation |
| Should the output file be overwritten on each run, or appended? | Product | No — default to overwrite; append can be P1 |
Does --agent-digest=term need to print anything for a completely empty test session (no tests collected)? |
Engineering | No |
Timeline Considerations¶
- No hard deadlines identified
- Suggested phase 1 scope: all P0 requirements — CLI flags, Markdown formatting, section structure, file output, ini config
- P1 features (duration,
bothshorthand, warnings section) deferred to a follow-up release after real-world agent feedback - The plugin is currently in scaffold phase; the entry point in
pyproject.tomlneeds to be registered before any plugin hooks will fire