Agents Shipgate Tool-Use Readiness

Your coding agent changed what your AI agent can do.
Agents Shipgate tells you whether it can merge.

The deterministic merge gate for AI-generated agent capability changes.

When Claude Code, Codex, or Cursor changes an agent's tools, scopes, prompts, or policies, Agents Shipgate turns that PR diff into a deterministic merge verdict — before the agent gets production-like permissions.

Local-first and static by default. No agent execution, tool calls, LLM calls, or network access.

Release decision passed review_required insufficient_evidence blocked
merge check · PR #482 · feature/refunds-mcp-tool
Tool-surface diff
v0.11.0
Capability delta
+ stripe.create_refundnew tool
~ shipgate.yamlrelease policy changed
Merge verdict blocked

Why: money-moving action added without approval or idempotency evidence.

When a PR changes what an AI agent can do.

Run Agents Shipgate when Codex, Claude Code, Cursor, or a human changes agent tools, MCP exports, OpenAPI specs, prompts, permission scopes, approval policies, confirmation policies, release gates, or shipgate.yaml.

The question is no longer just whether the tool surface can be scanned. The PR question is whether the capability change has enough evidence to merge.

From capability change to merge verdict, in three steps.

Merge-time, not runtime. Shipgate answers the release question before promotion — where evals (behavior), code review (code), and observability (runtime) don't.

01

A capability changes

Your coding agent — Claude Code, Codex, or Cursor — or a human changes what the agent can do: its tools, scopes, prompts, or policies, in a PR.

02

Shipgate reads the diff

agents-shipgate verify reads the diff and the declared policies — statically. No agent execution, tool calls, LLM calls, or network access.

03

You get a verdict

A deterministic merge verdict — with the capability delta and the next safe action. Same diff, same verdict, every time.

Every run resolves to one release_decision.decision value: passed review_required insufficient_evidence blocked

One review surface for every tool source.

Adapters normalize each framework's tool declarations into the same Tool-Use Readiness review — statically, with no code import or execution.

MCP exportsModel Context Protocol tool manifests
OpenAPI 3.xREST surfaces read as agent tools
OpenAI Agents SDKPython @function_tool, static AST
Anthropic Messages APItool-use definitions
Google ADKPython and YAML config
LangChain / LangGraphstatic Python tool inputs
CrewAIstatic Python tool inputs
OpenAI Agents APIhosted agent tool definitions
Codex pluginsplugin packages & marketplace stubs
n8n workflowsworkflow JSON tool nodes

Then it writes a Tool-Use Readiness Reportverifier.json, report.json, pr-comment.md, and optional SARIF.

Install Shipgate where your coding agents already work.

Add Shipgate instructions, skills, PR template guidance, and the GitHub Action to the same repo surfaces Codex, Claude Code, and Cursor already read.

Install agent workflow
AGENTS.md Codex skill Claude Code skill Cursor rules PR template GitHub Action

Installs the repo instructions your coding agents already read, plus the advisory PR gate reviewers use.

$ agents-shipgate init --workspace . --write --ci \
  --agent-instructions=all
Verify current PR
$ pipx install agents-shipgate
$ agents-shipgate verify --preview --json
$ agents-shipgate init --workspace . --write --ci --agent-instructions=all
$ agents-shipgate verify --workspace . --config shipgate.yaml --base origin/main --head HEAD --ci-mode advisory --format json

Use when a PR changes agent tools, prompts, MCP/OpenAPI surfaces, permissions, policy, CI, or shipgate.yaml.

GitHub Action
- uses: ThreeMoonsLab/agents-shipgate@v0.11.0
  with:
    config: shipgate.yaml
    ci_mode: advisory
    diff_base: target
    pr_comment: "true"

Runs the verifier on pull requests and posts the PR comment artifact without blocking while your team adopts the gate.

Verifier artifacts agents-shipgate-reports/verifier.json agents-shipgate-reports/report.json agents-shipgate-reports/pr-comment.md SARIF remains available through the GitHub Action or output format configuration.

Don’t let AI edit the rules that review it.

Shipgate flags PRs that touch shipgate.yaml, AGENTS.md, Claude/Codex skills, policy packs, baselines, waivers, suppressions, or the GitHub Action that runs Shipgate. AI-generated policy weakening cannot self-approve.

Rule fileshuman review required
Release policycannot self-weaken
Verifier outputmerge verdict
AI may fix mechanicallyMissing paths, report ignores, stale generated entries, and safe high-confidence config patches.
AI must not assertApproval, confirmation, idempotency, broad-scope safety, prohibited-action enforcement, or runtime-trace proof.
Humans own authorityBusiness policy, waivers, suppressions, release gates, and acceptance of human-review requirements.

What coding agents know after install.

Codex reads AGENTS.md. Claude Code can use repo instructions, skills, slash commands, and local hooks. Cursor reads rules. Shipgate installs the verifier instructions into those surfaces so generated PRs know the definition of done.

Codex

Make the PR completion rule explicit in AGENTS.md and the repo-scoped Codex skill.

$ agents-shipgate init --workspace . --write --ci \
  --agent-instructions=agents-md,codex-skill

Claude Code

Use Claude instructions and skills for local early warning; CI remains the merge authority.

$ agents-shipgate init --workspace . --write --ci \
  --agent-instructions=claude-md,claude-code-skill

Definition of done for agent capability PRs

Before claiming completion on any PR that changes agent tools, MCP exports,
OpenAPI specs, prompts, permissions, policies, CI gates, or shipgate.yaml, run:

    agents-shipgate verify --base origin/main --head HEAD --format json

Read agents-shipgate-reports/verifier.json first.
Do not claim completion when the merge verdict is blocked,
insufficient_evidence, or human_review_required unless the user explicitly
accepts the human review requirement.

Never weaken shipgate.yaml, Shipgate CI, AGENTS.md, skills, policy packs,
baselines, waivers, or suppressions merely to make Shipgate pass.

Turn an agent PR into a capability-level merge verdict.

Agents Shipgate reads declared local tool sources and policy files, compares PR capability changes, runs deterministic checks, and writes verifier artifacts that tell reviewers whether the change can merge.

CompanyThree Moons Lab
ProductAgents Shipgate
CLI / package / repoagents-shipgate
Primary outputMerge verdict
AI engineersPR-time feedback on capability changes before review
Platform engineersDeterministic merge gates for agent repositories
Security & GRC teamsEvidence for agent capability approval and policy review
Inputs
MCP exports
OpenAPI specs
SDK/framework metadata
OpenAI Agents SDK · Anthropic Messages API · Google ADK · LangChain/LangGraph · CrewAI · OpenAI Agents API · Codex plugins · n8n
agents-shipgate
static · local · deterministic
Outputs
verifier.json
pr-comment.md
GitHub verdict
report.json and SARIF stay available for deeper release review.
Agent buildersSee which tools, prompts, scopes, and policies changed in the PR.
Platform teamsGate generated agent changes without delegating release authority to the generator.
Security reviewersGet static merge evidence without running agents or importing user code.

What it checks before release.

The landing page shows the release-review categories; the full check catalog stays in the repo.

01approval

Approval gaps

Write, destructive, financial, or external communication tools without declared approval policies.

02surface

Wildcard tool sources

Wildcard MCP or inventory sources that expose an unreviewable tool surface.

03auth

Broad scopes

Manifest or tool permissions that rely on wildcard or overly broad authorization scopes.

04schema

Free-form action fields

Fields such as body, command, action, or updates that let the model control too much.

05bounds

Missing bounds

Unbounded arrays, objects, strings, or numeric fields on side-effecting operations.

06retry

Idempotency gaps

Write actions where retry behavior could duplicate refunds, updates, messages, or deletes.

07static

Dynamic surfaces

Framework toolsets that cannot be statically reviewed without explicit inventory evidence.

08review

Owner and policy evidence

Tools missing reviewer-friendly ownership, scope, approval, or prohibited-action coverage.

09baseline

Baseline drift

New, matched, and resolved findings when a reviewed baseline is present.

A PR comment reviewers can act on.

Shipgate turns the PR diff into a verifier artifact, a reviewer-ready PR comment, and a release decision. Findings still include severity, evidence, source references, confidence, and next actions for deeper review.

agents-shipgate-reports / pr-comment.md
Agents Shipgate: blocked
Merge blocked
Verdict
blocked
Blockers
2
Review
required
Changed actions
3
Capability changes
#01 Critical openapi/billing.yaml:142
stripe.create_refund lacks a declared approval policy
evidence: financial_action · external_write · POST /refunds
recommend: Add approval policy or remove from this release.
confidence: high · check_id: SHIP-POLICY-APPROVAL-MISSING
#02 Critical openapi/billing.yaml:142
stripe.create_refund lacks idempotency evidence
evidence: write action · amount/currency/payment_id schema · retry behavior unknown
recommend: Add idempotency key or document retry policy.
confidence: high · check_id: SHIP-IDEMPOTENCY-MISSING
#03 High shipgate.yaml:18
wildcard_mcp_tools.* exposes an unreviewable tool surface
evidence: wildcard tool source in shipgate.yaml
recommend: Replace wildcard with explicit allowlist.
confidence: high · check_id: SHIP-SOURCE-WILDCARD
#04 High mcp_tools.json:#/tools/send_email
support.send_email accepts free-form 'body' field
evidence: external_communication · no template binding
recommend: Constrain to template IDs or require human confirmation.
confidence: medium · check_id: SHIP-SCHEMA-FREEFORM-ACTION
#05 Medium openapi/tickets.yaml:88
tickets.update is missing maximum bound on 'fields'
evidence: unbounded object · broad write
recommend: Document or enforce a field allowlist.
confidence: medium · check_id: SHIP-SCHEMA-MISSING-BOUND
verifier_schema_version 0.1 · report_schema_version 0.21 generated on PR verification

What drives the verdict

The verifier combines capability diffs with static release evidence.

New actions Approval gaps Broad scopes Prompt changes Schema bounds Idempotency gaps Trust-root edits Baseline changes

Merge verdictspassed · review · evidence · blocked
Primary artifactverifier.json
Reviewer artifactpr-comment.md
Gate signalrelease_decision.decision

Proof from public tool surfaces, without turning the homepage into docs.

Four public examples show the verifier on realistic SDK/framework code and larger API surfaces. Each card keeps one representative capability issue visible and leaves full output in GitHub or docs.

OpenAI Agents SDK2 toolsHigh findings

Airline customer service agent

Static AST extraction finds a write-capable update_seat tool without enough release-review evidence.

Representative findingupdate_seat changes customer state and needs explicit scope and policy coverage.
Expand config excerpt
tool_sources:
  - id: openai_agents_sdk
    type: openai_agents_sdk
    path: main.py
environment:
  target: production_like
Anthropic Messages API3 toolsCritical

Cookbook customer service agent

A real published tool-use example includes cancel_order, a destructive action that needs approval evidence.

Representative findingcancel_order is destructive and ships without a declared approval policy.
Expand config excerpt
tool_sources:
  - id: anthropic_tools
    type: anthropic_messages
    path: tools.json
policies:
  approval_required_for: [destructive]
OpenAPI591 toolsStress test

DigitalOcean public API as agent tools

A cloud infrastructure API reframed as a broad agent surface exposes irreversible droplet, database, and Kubernetes operations.

Representative findingDestructive infrastructure actions are present without explicit approval policies.
Expand config excerpt
tool_sources:
  - id: digitalocean_openapi
    type: openapi
    path: openapi.yaml
permissions:
  scopes: ["*"]
OpenAPI167 toolsStress test

Twilio Messaging API purpose mismatch

A read-only manifest pointed at a messaging API still exposes DELETE-capable tools that contradict the declared purpose.

Representative findingRead-only release intent conflicts with message and phone-number deletion operations.
Expand config excerpt
agent:
  declared_purpose:
    - read messaging inventory
tool_sources:
  - id: twilio_openapi
    type: openapi
    path: messaging.yaml

Verify locally, comment on PRs, then tighten when ready.

CI is advisory by default. Strict mode can fail after the team has reviewed the baseline, trust-root policy, and what must remain human-approved.

Modes Local verifier PR advisory Strict CI Human review
Local verifier
bash
$ pipx install agents-shipgate
$ agents-shipgate verify --preview --json
$ agents-shipgate init --workspace . --write --ci --agent-instructions=all
$ agents-shipgate verify --workspace . --config shipgate.yaml --base origin/main --head HEAD --ci-mode advisory --format json

Requires Python 3.12+. Use python -m pip install agents-shipgate if pipx is not available.

.github/workflows/shipgate.yml
yaml
name: Agents Shipgate

on:
  pull_request:

permissions:
  contents: read
  pull-requests: write

jobs:
  shipgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@<pinned-sha>
        with:
          fetch-depth: 0
      - uses: ThreeMoonsLab/agents-shipgate@v0.11.0
        with:
          config: shipgate.yaml
          ci_mode: advisory
          diff_base: target
          pr_comment: "true"
          output_dir: agents-shipgate-reports

The Action runs the verifier on pull-request diffs using the pinned v0.11.0 release.

Local verifier

Fast feedback while the PR is still being generated or edited.

PR comment

Post the merge verdict and capability delta without blocking while adopting.

Strict CI

Fail only after the team accepts the baseline and trust-root policy.

Human authority

Keep approval, idempotency, suppressions, and policy weakening out of auto-fix.

Built for generated PRs that need deterministic review.

Local-first and static by default. No agent execution, tool calls, LLM calls, or network access. It does not import user code, connect to MCP servers, or collect telemetry by default.

Designed for PR-time evidence before an AI-generated capability change lands. Open-source core. Transparent checks. Suppressions require reasons.

Inspect the checks. Verify a PR. Open an issue with a false positive.

Default verifier guarantees
 Static by default
  • No agent execution
  • No user-code import by default
  • No tool calls
  • No LLM calls
  • No MCP server connections
  • No verifier network calls by default
  • No verifier telemetry by default
  • Apache-2.0 open source

Not AI code review. Not evals. Not a gateway.

AI code review asks whether an implementation is correct. Shipgate asks whether a PR gave an agent new real-world capabilities, and whether that capability has release evidence.

Not
AI code review
It reviews implementation quality.
Shipgate reviews capability expansion.
Not
Evals
They test behavior.
Shipgate reviews release evidence.
Not
Security scanner
It finds code and dependency risks.
Shipgate asks what the agent can now do.
Shipgate
Merge verifier
A PR gate for agent capability changes.
Deterministic verdict. Evidence attached.
Side-by-side
Category What it answers What Shipgate answers
AI code reviewIs this diff idiomatic, correct, or buggy?Did this diff expand agent capability?
Code security scanningAre dependencies or code patterns risky?Are agent tools, scopes, schemas, and policies reviewable?
EvalsDid the model behave on curated inputs?Can this tool/action surface merge?
Runtime gatewayShould this call be allowed now?Should this capability be released at all?

Common merge-verdict questions.

Short answers for developers, platform teams, and security reviewers before trying the verifier.

What is agents-shipgate?

agents-shipgate is an open-source CLI and GitHub Action that verifies AI-generated agent capability changes and returns a deterministic merge verdict. The display name is Agents Shipgate; the package, CLI, repo, and action name are agents-shipgate.

What should trigger a Shipgate run?

Run it when a PR changes agent tools, MCP exports, OpenAPI specs, prompts, permissions, approval policies, confirmation policies, CI gates, or shipgate.yaml. Existing opted-in repos can run it on every PR.

What is the merge verdict?

The wire signal is still release_decision.decision. Public verdict labels are mergeable, human_review_required, insufficient_evidence, and blocked.

What does it actually check?

Agents Shipgate checks declared tools, schemas, scopes, approval policies, side effects, idempotency evidence, prompt/policy surfaces, trust-root files, baselines, and suppressions. It asks whether the capability change has enough release evidence to merge.

How is this different from LLM evals?

Evals validate behavior on inputs you wrote. Agents Shipgate validates the static release artifact — manifest, tool schemas, scopes, and policies — without running the model. Use both.

How is this different from observability or runtime guardrails?

Observability records what happened at runtime, and guardrails enforce access at runtime. Agents Shipgate runs earlier: it turns declared tool surfaces into static release-review evidence before promotion.

Does it call my agent or send my data anywhere?

No. The verifier is static by default: no agent execution, no user-code import by default, no tool calls, no LLM calls, no MCP server connections, no verifier network calls by default, and no verifier telemetry by default.

What inputs does it support?

It supports MCP exports, OpenAPI 3.x specs, OpenAI Agents SDK Python entrypoints, Anthropic Messages API artifacts, Google ADK Python and YAML config, LangChain/LangGraph, CrewAI, OpenAI Agents API artifacts, Codex plugin packages, and n8n workflow artifacts.

What output formats does it write?

The PR verifier writes agents-shipgate-reports/verifier.json, report.json, and pr-comment.md. SARIF remains available through the GitHub Action or output format configuration.

What version does this site advertise?

This site advertises the published v0.11.0 release. Install from PyPI with pipx install agents-shipgate and pin GitHub Actions to ThreeMoonsLab/agents-shipgate@v0.11.0.

How do I add it to GitHub Actions?

Add ThreeMoonsLab/agents-shipgate@v0.11.0 to a pull-request workflow with fetch-depth: 0, diff_base: target, pr_comment: "true", and ci_mode: advisory.

Does it certify my agent as safe?

No. Agents Shipgate is not a safety certification, runtime gateway, or behavioral eval. It produces deterministic findings from tool definitions, schemas, scopes, and declared policies so release owners have evidence to review.

Which JSON field is the release gating signal?

Read release_decision.decision from report.json or the same block projected into verifier.json. Values are passed, blocked, review_required, and insufficient_evidence.

What may coding agents fix automatically?

They may install the workflow, add ignored local report directories, run verify, summarize artifacts, and apply high-confidence mechanical patches. They must not assert approval, confirmation, idempotency, broad-scope safety, prohibited-action enforcement, or runtime-trace proof without human review.

What is a release gate for AI agents?

A release gate is a deterministic CI check that runs on every pull request and fails the build when the agent's release artifact contains unsafe state. For AI agents specifically, the release gate inspects the manifest, tool surface declarations, scopes, and policies — not the model itself or runtime behavior. Agents Shipgate fits this slot.

What is a shipgate.yaml manifest?

shipgate.yaml is a checked-in YAML file that declares an agent's release context: tool sources (MCP, OpenAPI, SDK), permissions, approval and idempotency policies, risk overrides, and CI settings. Agents Shipgate reads it as the single source of truth for what should be reviewed at release time. The minimal valid manifest requires version, project, agent, and environment blocks plus at least one entry under tool_sources.

What is a Tool-Use Readiness Report?

A Tool-Use Readiness Report is the deterministic output of an agents-shipgate scan. It contains a release decision with passed, blocked, review_required, and insufficient_evidence, a finding list with severities and recommended remediation, and seven dimensions of evidence coverage. Written as Markdown for human review, JSON for tools and coding agents, and SARIF for GitHub code scanning.

What does "blast radius" mean for an AI agent?

Blast radius is the seventh dimension of tool-use readiness. It asks: if a tool fires unexpectedly, how bounded is the damage? Evidence includes a declared owner so the right team gets paged, an enumerated list of prohibited actions in agent.prohibited_actions, and resource-scope bounds (per-tenant, per-customer, per-resource-prefix). High-risk tools without these bounds get blast-radius findings.

How is this different from unit tests for my agent?

Unit tests verify specific behaviors on inputs the team wrote — the model returns the right tool call on this input, a helper processes this fixture correctly. Agents Shipgate verifies the static release artifact: which tools are exposed, what schemas they accept, what policies gate them. Unit tests cannot catch a write tool shipping without an approval policy; the release gate cannot catch a behavior regression on a curated test. Use both at different stages of CI.

Does Agents Shipgate work with LangChain, CrewAI, Google ADK, and other frameworks?

Yes. Agents Shipgate supports OpenAI Agents SDK, Anthropic Messages API, Google ADK (Python and YAML), LangChain/LangGraph, CrewAI, MCP exports, OpenAPI 3.x specs, OpenAI Agents API artifacts, Codex plugins, and n8n workflows. Static AST extraction means no code import or execution. The seven dimensions of tool-use readiness apply regardless of framework — adapters normalize each framework's tool declarations into the same review surface.

How does Agents Shipgate fit into my CI/CD pipeline?

Add the GitHub Action to your pull-request workflow in advisory mode. It runs on every PR, scans the manifest plus tool sources, and posts findings as a PR comment without failing the build. Triage and baseline existing findings, then switch to strict mode (fail_on: critical,high) to block net-new gaps. Typical CI order: dependency install, release-readiness scan, unit tests, eval suite, deploy. See the full CI/CD tutorial.

When should I use strict mode vs advisory mode?

Use advisory mode when first adopting Agents Shipgate or when you have a backlog of findings to triage — the scan runs and reports but never fails the build. Switch to strict mode after you have saved a baseline (agents-shipgate baseline save) and reviewed the existing findings. Strict mode fails only on net-new findings above critical or high severity. The adoption path is: advisory, then review and baseline, then strict plus baseline.

Start with PR merge verdicts. Add release evidence over time.

Today, Agents Shipgate makes AI-generated capability changes reviewable before merge. The next layers are baselines, suppressions, release history, policy drift, re-review triggers, and runtime evidence integrations.

01 · now

AI-generated PR verifier

CLI + GitHub Action + PR comment.

02

Release evidence

Reports, baselines, history, exceptions.

03

Runtime integrations

Trace evidence without replacing static review.

Get started

Make Shipgate part of your coding-agent definition of done.

$ agents-shipgate init --workspace . --write --ci --agent-instructions=all
$ agents-shipgate verify --base origin/main --head HEAD --format json

Have an AI-generated agent PR? Bring one and we'll review the capability delta, trust root, and merge verdict with your team.