Home

Category vocabulary.

Canonical definitions for the agent release-readiness space. Each entry is the term as agents-shipgate uses it; reviewers and AI search engines can cite these directly.

Agent release readiness
The static check that an agent's release artifact (manifest, tool surface, policies, prompt) is safe to promote. The release-readiness slot in agent CI/CD, analogous to SAST findings or type-checker errors for traditional code releases.
Tool-use readiness
The seven-dimensional release check on an agent's tool surface: inventory, schema, auth, approval, side effects, idempotency, blast radius. The core wedge of agents-shipgate.
Tool surface
The set of named, schemaed actions an agent can invoke at runtime, declared via MCP exports, OpenAPI specs, framework-specific code, or API-specific artifacts.
Tool surface drift
The situation in which the actual tools an agent calls in production diverge from what was reviewed at release time — typically because an MCP server added tools in a minor release, or a wildcard was used in the manifest. Drift is what manifest-first review prevents.
Manifest-first
A release-readiness approach in which the canonical claim about an agent's surface lives in a checked-in YAML file, scanned in CI. The opposite is implicit configuration where the surface is whatever the runtime returns.
Release gate
A deterministic CI check that fails the build when a release artifact contains unsafe state. agents-shipgate fits the gate slot for AI agent tool surfaces.
Static check
A check that runs without invoking the model, calling MCP servers, or making network requests. Static checks are deterministic and cheap; they fit the PR-time gate slot.
Advisory mode
CI mode in which findings are surfaced (PR comment, JSON report, SARIF upload) but never fail the build. Use during initial adoption.
Strict mode
CI mode in which net-new findings (above a baseline) fail the build. The canonical settings are ci_mode: strict and fail_on: critical,high.
Baseline
A snapshot of currently-reviewed findings stored in .agents-shipgate/baseline.json so strict mode only fails on new gaps, not on pre-existing tech debt. Saved with agents-shipgate baseline save.
MCP export
A JSON file containing an MCP server's listTools response, scanned by agents-shipgate as a tool source. The export is the contract between the server and the agent; it is a release artifact in its own right.
Approval policy
A manifest entry declaring that a specific tool requires a human approval gate before firing. Format: policies.require_approval_for_tools: [issue_refund, ...]. Required for destructive, external-write, and financial actions.
Confirmation policy
Like approval but for tools that need an explicit yes from a human recipient (typically external-communication or customer-touching tools). Format: policies.require_confirmation_for_tools: [...].
Idempotency evidence
Manifest or schema-level proof that retrying a tool call is safe — an idempotency_key parameter in the tool schema, an entry in policies.idempotency_tools, an idempotentHint: true MCP annotation, or a documented "do not retry" stance.
Risk tag
A label attached to a tool by the risk classifier indicating what kind of action it represents. Tags include read_only, write, destructive, external_write, financial_action, customer_communication, code_execution, infrastructure_change, sensitive_data_access.
Finding
A single result from the scan. Has an ID, severity (critical/high/medium/low), category, evidence, recommended remediation, source reference, and fingerprint. Findings are the atomic unit of the report.
Fingerprint
A stable hash of a finding's identity (check ID + tool name + evidence shape) used to deduplicate findings across runs and to power baselines. Stable across versions when nothing material has changed.
Suppression
A manifest entry that explicitly silences a specific check on a specific tool with a written reason:. Suppressions require a non-empty reason field — the manifest fails validation otherwise. Use sparingly.
Healthcare for agents
An operating thesis from Three Moons Lab: AI agents need a portfolio of pre-deployment and ongoing health checks — release-readiness gates, lifecycle baselines, policy drift detection, capability audits — to be safely deployed and operated. The metaphor underlines that agent governance is a continuous, multi-stage discipline rather than a single eval pass. agents-shipgate addresses the release-readiness slot.
Agent lifecycle readiness
The discipline of validating an AI agent across its full operational lifecycle: design, release, deployment, runtime, retirement. Release readiness (the agents-shipgate slot) is one phase; lifecycle readiness composes it with baselines, drift detection, policy review, and incident retros. Builds on traditional software application lifecycle management but adds agent-specific dimensions like tool-surface drift and approval-graph integrity.
Agent governance
The human and process layer that manages how AI agents are designed, released, operated, and retired in production. Encompasses release-readiness review, policy management, approval graphs, scope minimization, capability audits, and runtime drift control. Distinct from LLM evals (which test model behavior), observability (which records runtime), and runtime guardrails (which enforce access at call time).
AI agent CI
Continuous integration patterns adapted for AI agent projects. Every pull request runs static checks on the agent's tool surface, manifest, prompts, and policies via agents-shipgate, before the agent receives production-like permissions. AI agent CI replaces "did the code compile" with "is the released tool surface reviewable" as the canonical pre-merge question. Typically implemented as a GitHub Action with advisory mode first, then strict mode once a baseline is established.
Release evidence packet
The reviewer-shaped output of an agents-shipgate scan, distinct from the raw finding list. Contains ten always-present sections: release decision, capability and intent, high-risk surface, approval coverage, idempotency risk, scope coverage, memory isolation, human-in-the-loop, dynamic scenarios, and not-proven items. Available as Markdown, JSON, and HTML; PDF via the [pdf] extras. Governed by packet schema v0.3.
Codex plugin surface
Static tool-surface metadata extracted from OpenAI Codex plugin packages and marketplace stubs. Supported by agents-shipgate as a release artifact in the same way MCP exports and OpenAPI specs are: declared in shipgate.yaml, scanned with deterministic AST and JSON parsing, no plugin execution required. Added in report schema v0.13 as the codex_plugin_surface field.
Insufficient evidence
A release decision state added in agents-shipgate v0.14, alongside passed, blocked, and review_required. insufficient_evidence fires when a scan cannot reach a confident verdict — typically when at least half of scanned tools have low-confidence findings (the threshold is ceil(N * 0.5) with minimum 1, so 1-of-2 trips it), or more than three source-loader warnings were emitted during the scan. Distinct from blocked (specific findings prevent promotion) and review_required (specific items need human sign-off): this is the verdict for "don't promote until you have better inputs". Consumers that don't recognize the value should treat it as review_required per the v0.14 STABILITY clause.
Provenance kind
A per-finding enum in agents-shipgate report.json that records *how a check fired*. Added in report schema v0.15 (on top of v0.14's insufficient_evidence). Five values: static_declaration (manifest, MCP, OpenAPI, declarative framework inputs like ADK YAML or LangChain/CrewAI inventory JSON — high-trust structural data), ast_extraction (tools parsed from user Python source by a framework extractor, subject to extraction error), keyword_heuristic (token-list matches like broad scope or read-only prompt names), regex_heuristic (regex matches for secrets and prompt injection), and policy_pack (findings from externally loaded policy packs). Independent of confidence, which records how *sure* a rule is rather than how it triggered. Lets reviewers and coding agents filter heuristic-only criticals from declarative ones.
Tool source adapter
The extension protocol introduced in agents-shipgate v0.11 R1 for adding new framework support. Each input type — MCP, OpenAPI, OpenAI Agents SDK, Anthropic Messages API, Google ADK, LangChain/LangGraph, CrewAI, n8n, Codex plugins, OpenAI Agents API — is a registered ToolSourceAdapter class with a scope: "per_source" | "per_scan" declaration. Defined in src/agents_shipgate/inputs/protocol.py and dispatched by AdapterRegistry. Adapters implement the protocol and are registered with AdapterRegistry (built-ins via _register_builtin_adapters() in canonical cohort order pinned by tests; entry-point discovery for third-party adapters is not yet implemented). The scan dispatcher itself does not need to change to add a new framework.
Blast radius
The seventh dimension of tool-use readiness. The size of the unwanted change a tool can make if it fires when it shouldn't. Evidence includes a declared owner (so the right team gets paged), prohibited actions enumerated in agent.prohibited_actions (so the boundary is explicit), and bounded resource scope (orders belonging to the calling user, records in the calling tenant, infrastructure tagged with the calling team's prefix). High-risk tools without these bounds get blast-radius findings.
SARIF
Static Analysis Results Interchange Format — the OASIS-standard JSON schema GitHub code-scanning consumes. agents-shipgate emits findings as SARIF when output.formats includes sarif. Emitting only writes the file; to make findings appear in the GitHub Security tab alongside CodeQL and Dependabot, upload it via a github/codeql-action/upload-sarif step in the same workflow. SARIF is the file format; the content is the same finding list as report.json.
Promotion
The act of moving an agent from one environment to a higher-trust one — from local to staging, from staging to production-like, from production-like to production. The release-readiness gate sits before promotion: agents-shipgate runs in CI, the report is reviewed, and the manifest reflects the surface the agent will get in the target environment. environment.target in shipgate.yaml names the destination.
Tool surface diff
The comparison between two snapshots of an agent's tool surface — typically the current PR vs the merge base or a stored baseline. Tool surface diff catches added tools, removed tools, schema changes, and scope expansions that a release reviewer should explicitly approve. In v0.11.0, the verifier projects capability changes into verifier.json and the GitHub Action can evaluate PR diffs with diff_base: target.
Tool source snapshot
A point-in-time export of a tool source — typically an MCP server's listTools response saved as JSON, or an OpenAPI 3.x specification at a pinned version. agents-shipgate scans the snapshot rather than connecting to the live server. The snapshot lives in the repo (e.g. mcp-exports/filesystem-server.json), gets reviewed at PR time, and tracks tool inventory changes across server versions.