Workflows¶
Seven workflows are registered today (src/workflows/registry.ts). Each has a single label, a single comment-mention verb, and produces one Markdown artifact that becomes the body of the tracking comment.
| Workflow | Label | Surfaces | What it does | Detail |
|---|---|---|---|---|
triage |
bot:triage |
Issue label or comment | Decides whether an issue is actionable, with structural or runtime evidence | TRIAGE.md |
plan |
bot:plan |
Issue label or comment, after triage |
Writes an implementation plan | PLAN.md |
implement |
bot:implement |
Issue label or comment, after plan |
Opens a PR with code, tests, and a filled-out PR template | IMPLEMENT.md |
review |
bot:review |
PR label or comment | Reads the diff in full, posts findings as inline comments | REVIEW.md |
resolve |
bot:resolve |
PR label or comment | Fixes failing CI, replies to review threads, pushes new commits | RESOLVE.md |
ship |
bot:ship |
PR comment, label, or natural-language mention | Shepherds an open PR to merge-ready: probe → fix → reply → wait, until clean | tracking comment |
remember |
bot:remember |
Issue or PR (any comment) | Captures a maintainer review-policy directive into review_learnings for future reviews |
tracking comment |
How they relate¶
triage, plan, and implement are the issue-side cascade for new work. review and resolve are the PR-side pair: review proactively reads a diff and posts findings; resolve reactively answers existing feedback and fixes failing CI. The split is deliberate, conflating "look at this PR" with "fix this PR" was the design mistake the verb-rename corrected.
ship is the PR shepherding lifecycle (its own state machine, its own database tables). It does not run the cascade above; it drives an open PR through the merge-readiness probe ladder until a human can hit merge. See ship.md.
Common rules across all workflows¶
- The bot never merges. No workflow calls
pulls.mergeor posts anAPPROVE/REQUEST_CHANGESreview. Static guard atscripts/check-no-destructive-actions.ts. - Always-rebase semantics. PR-side workflows (
review,resolve,ship) rebase the branch onto base before reading the diff if it is behind, thengit push --force-with-lease. Fork PRs cannot be force-pushed by the bot: it asks the contributor to rebase and proceeds against the stale head. - One Markdown artifact, one tracking comment. Each run captures
<NAME>.mdfrom the working tree before cleanup and embeds it verbatim in the tracking comment. - Tracking comments are idempotent. Every tracking comment carries a hidden
<!-- workflow-run:{id} -->marker.setState()insrc/workflows/tracking-mirror.tsscans for the marker before posting, adopts any pre-existing comment found (e.g. after an octokit retry that silently duplicated aPOST, or a pod restart between create and CAS reservation), and reconciles duplicates after create, keeping a single canonical comment per run regardless of transient API failures. - Cost is visible. Every workflow records
cost_usd,turns, andwall_clock_mson the run row. The shepherding lifecycle exposes cumulative spend in the tracking comment header.
Maintainer comments steer the workflow¶
The five structured workflows (triage, plan, implement, review, resolve) are comment-aware. Before each run, src/workflows/discussion-digest.ts distills the issue/PR comment thread into a guidance digest that the workflow prompt consumes in place of the raw thread:
- Later owner comments override the body. Comments by
ALLOWED_OWNERSauthors become authoritative directives; where one conflicts with the issue/PR body, the directive wins. So you can runbot:plan, comment a correction, runbot:planagain, and the second run honours the correction (the issue body alone no longer pins the result). - Non-owner comments are context only. They appear in the digest labelled as untrusted discussion the agent must account for but never obey.
- The bot's own prior output is context. A reply to the bot's earlier plan/review is interpretable because that prior output is summarised into the digest.
- PR review-thread comments count. On a PR, inline review comments and review summary bodies feed the digest too, with their
path:lineanchors preserved. - No comment-count limit. A large thread is summarised via map-reduce; no comment is dropped. The step is fail-open: any LLM or fetch error falls back to body-only / raw-comment context.
Re-running a workflow also removes that workflow's previous tracking comment before posting the new one, so the thread does not pile up stale bot output.
Trigger-comment intent classifier¶
A comment that mentions the trigger phrase is routed through src/workflows/intent-classifier.ts: a single-turn Haiku call that returns { workflow, confidence, rationale }.
confidence < INTENT_CONFIDENCE_THRESHOLD(default0.75) → the dispatcher posts a clarification reply and stops.workflownot in registry → refusal reply.workflowin registry → same dispatch as the label path.
The classifier prompt distinguishes review (proactive, find bugs, post inline findings) from resolve (reactive, fix CI, answer feedback). Tune the threshold per environment with INTENT_CONFIDENCE_THRESHOLD.
Label-path dispatch¶
Both the label trigger and the in-registry classifier verdict run the same seven-step sequence in src/workflows/dispatcher.ts: registry lookup → context check → prior-output requirement → label mutex → idempotency insert → job enqueue → return. Prior-output is checked before the mutex, so refusing a workflow that lacks its prerequisite (e.g. bot:implement before any bot:plan) does not strip unrelated bot:* labels. The idempotency insert is the durable in-flight guard: a redelivered or concurrent label event for the same workflow and target is rejected at the database, not just at the best-effort Valkey claim.
Conversational chat-thread (sub-threshold fallback)¶
When the intent classifier verdict is below INTENT_CONFIDENCE_THRESHOLD AND the conversational backend (DATABASE_URL) is configured, the dispatcher routes the comment to src/workflows/ship/scoped/chat-thread.ts instead of refusing: a freeform exchange entry point for review threads, PR replies, and issue comments. Output modes the executor can return are validated by Zod (answer, decline, execute-workflow, propose-workflow, propose-action, approve-pending, decline-pending, replace-proposal).
Tool surface (PR conversations only)¶
On PR events, chat-thread and the orchestrator-side triage engine drive Anthropic's tool-use loop via src/ai/llm-client.ts runWithTools. Both share the github-state tool set defined in src/github/state-fetchers.ts:
| Tool | Purpose |
|---|---|
get_pr_state_check_rollup |
Head-commit CI rollup + per-check rows + is_required |
get_check_run_output |
Single check run summary + truncated text + html_url |
get_workflow_run |
Workflow run conclusion, logs_url, html_url |
get_branch_protection |
Required checks list, reviewers, protected: false on 404 |
get_pr_diff |
Unified diff (capped ~50 KB) |
get_pr_files |
File list with status + per-file additions/deletions |
list_pr_comments |
Paginated issue comments on the PR (30/page) |
The same surface is exposed to Agent SDK callers via src/mcp/servers/github-state.ts (registered when enableGithubState is true in the runPipeline overrides).
Caps and operator switches:
runWithToolsenforces a per-turn iteration cap (default 8 forchat-thread, 2 fortriage) and a per-turn fan-out cap (DEFAULT_MAX_TOOL_USES_PER_TURN= 4). Excesstool_useblocks getis_error: truetool_resultfeedback so the model adjusts on the next turn rather than triggering silent truncation.CHAT_THREAD_TOOLS_ENABLED(defaulttrue): whenfalse,chat-threadstays single-turn and answers only from the cached snapshot.TRIAGE_TOOLS_ENABLED(defaulttrue): whenfalse,triageclassifies from text alone even on PR events. Hot-path latency escape hatch.
The deterministic merge-readiness path (src/workflows/ship/probe.ts, src/workflows/ship/verdict.ts) is intentionally NOT tool-driven, its GraphQL probe (PROBE_QUERY, now centralised in src/github/queries.ts) is correctness-invariant for the merge gate.