Agents Shipgate Tool-Use Readiness

Your coding agent changed what your AI agent can do.
Agents Shipgate tells you whether it can merge.

Q: What is the merge verdict?

The wire signal is release_decision.decision. Public verdict labels are mergeable, human_review_required, insufficient_evidence, and blocked.

Q: How is this different from LLM evals?

Evals validate behavior on curated inputs. Agents Shipgate validates whether the PR expanded agent capability and whether that capability has release evidence. Use both.

Q: Does it call my agent or send my data anywhere?

No. The verifier reads local manifest and tool-source files, runs static checks, and writes local artifacts. No agent execution, no user-code import by default, no model invocation, no MCP server connections, no LLM calls, no verifier telemetry by default, and no verifier network calls by default.

Q: What output formats does it write?

The PR verifier writes agents-shipgate-reports/verifier.json, report.json, and pr-comment.md. SARIF remains available through the GitHub Action or output format configuration.

Q: What version does this site advertise?

This site advertises the published v0.14.0 release. Install from PyPI with pipx install agents-shipgate and pin GitHub Actions to ThreeMoonsLab/agents-shipgate@v0.14.0.

The deterministic merge gate for AI-generated agent capability changes.

When Claude Code, Codex, or Cursor changes an agent's tools, scopes, prompts, or policies, Agents Shipgate turns that PR diff into a deterministic merge verdict — before the agent gets production-like permissions.

Verify a PR Star on GitHub

Local-first and static by default. No agent execution, tool calls, LLM calls, or network access.

Status: pre-1.0 (beta). The decision engine is deterministic and stable, but Shipgate’s real-world detection accuracy is still being validated against a labeled corpus of agent PRs — no precision/recall numbers are published yet. On heavily dynamic tool surfaces, Shipgate deliberately returns insufficient_evidence rather than guess. Treat it as an advisory gate while that accuracy work is in progress.

Release decision passed review_required insufficient_evidence blocked

merge check · PR #482 · feature/refunds-mcp-tool

Tool-surface diff

v0.14.0

Capability delta

+ stripe.create_refundnew tool

~ shipgate.yamlrelease policy changed

Merge verdict blocked

Why: money-moving action added without approval or idempotency evidence.

Research · prepublication

Sandbox controls can. Evals measure does. Agents Shipgate reviews should vs. can before merge.

Sandboxes constrain runtime capability. Evals observe behavior in tested scenarios. Agents Shipgate compares the static tool authority introduced by a change with the purpose, scopes, and controls reviewers intended to approve.

Static only: Agents Shipgate does not execute the agent, replace runtime sandboxing, or prove runtime behavior.

Runtime

Sandbox controls can.

Constrains runtime capability.

Scenarios

Evals measure does.

Observes behavior in tested scenarios.

Merge time

Shipgate reviews should vs. can.

Compares static authority with intended purpose, scopes, and controls.

2026 open-source Agent tool-surface study 14 fixed snapshots · no ranking · evidence before conclusions

Explore the preregistration

The problem

When a PR changes what an AI agent can do.

Run Agents Shipgate when Codex, Claude Code, Cursor, or a human changes agent tools, MCP exports, OpenAPI specs, prompts, permission scopes, approval policies, confirmation policies, release gates, or shipgate.yaml.

The question is no longer just whether the tool surface can be scanned. The PR question is whether the capability change has enough evidence to merge.

How it works

From capability change to merge verdict, in three steps.

Merge-time, not runtime. Shipgate answers the release question before promotion — where evals (behavior), code review (code), and observability (runtime) don't.

A capability changes

Your coding agent — Claude Code, Codex, or Cursor — or a human changes what the agent can do: its tools, scopes, prompts, or policies, in a PR.

Shipgate reads the diff

agents-shipgate verify reads the diff and the declared policies — statically. No agent execution, tool calls, LLM calls, or network access.

You get a verdict

A deterministic merge verdict — with the capability delta and the next safe action. Same diff, same verdict, every time.

Every run resolves to one release_decision.decision value: passed review_required insufficient_evidence blocked

What it scans

One review surface for every tool source.

Adapters normalize each framework's tool declarations into the same Tool-Use Readiness review — statically, with no code import or execution.

MCP exportsModel Context Protocol tool manifests

OpenAPI 3.xREST surfaces read as agent tools

OpenAI Agents SDKPython @function_tool, static AST

Anthropic Messages APItool-use definitions

Google ADKPython and YAML config

LangChain / LangGraphstatic Python tool inputs

CrewAIstatic Python tool inputs

OpenAI APIhosted agent tool definitions

Codex repo config.codex/config.toml & hooks.json

Codex pluginsplugin packages & marketplace stubs

n8n workflowsworkflow JSON tool nodes

Then it writes a Tool-Use Readiness Report — verifier.json, report.json, pr-comment.md, and optional SARIF.

Quickstart

Install Shipgate where your coding agents already work.

Add Shipgate instructions, the manifest, and the advisory GitHub Action to the same repo surfaces Codex, Claude Code, and Cursor already read — with skill bundles and a PR template available as opt-ins.

Install agent workflow

AGENTS.md Cursor rule Claude command Local contract shipgate.yaml GitHub Action

The default kit installs the repo instructions your coding agents already read, the manifest, and the advisory PR gate. Add the Codex and Claude Code skill bundles, CLAUDE.md, and a PR template with --agent-instructions=all.

$ agents-shipgate init --workspace . --write --ci \
  --agent-instructions=default --json

Verify current PR

$ pipx install agents-shipgate
$ agents-shipgate verify --preview --json
$ agents-shipgate init --workspace . --write --ci --agent-instructions=default --json
$ agents-shipgate verify --workspace . --config shipgate.yaml --base origin/main --head HEAD --ci-mode advisory --format json

Use when a PR changes agent tools, prompts, MCP/OpenAPI surfaces, permissions, policy, CI, or shipgate.yaml.

GitHub Action

- uses: ThreeMoonsLab/agents-shipgate@v0.14.0
  with:
    config: shipgate.yaml
    ci_mode: advisory
    diff_base: target
    pr_comment: "true"

Runs the verifier on pull requests and posts the PR comment artifact without blocking while your team adopts the gate.

Verifier artifacts agents-shipgate-reports/agent-handoff.json agents-shipgate-reports/verifier.json agents-shipgate-reports/verify-run.json agents-shipgate-reports/report.json agents-shipgate-reports/pr-comment.md Capability lock diffs and SARIF remain available through the GitHub Action or output format configuration.

Trust-root protection

Don’t let AI edit the rules that review it.

Shipgate flags PRs that touch shipgate.yaml, AGENTS.md, Claude/Codex skills, policy packs, baselines, waivers, suppressions, or the GitHub Action that runs Shipgate. AI-generated policy weakening cannot self-approve.

Rule fileshuman review required

Release policycannot self-weaken

Verifier outputmerge verdict

AI may fix mechanicallyMissing paths, report ignores, stale generated entries, and safe high-confidence config patches.

AI must not assertApproval, confirmation, idempotency, broad-scope safety, prohibited-action enforcement, or runtime-trace proof.

Humans own authorityBusiness policy, waivers, suppressions, release gates, and acceptance of human-review requirements.

AI coding workflows

What coding agents know after install.

Codex reads AGENTS.md. Claude Code can use repo instructions, skills, slash commands, and local hooks. Cursor reads rules. Shipgate installs the verifier instructions into those surfaces so generated PRs know the definition of done.

Codex

Make the PR completion rule explicit in AGENTS.md and the repo-scoped Codex skill.

$ agents-shipgate init --workspace . --write \
  --agent-instructions=agents-md,codex-skill --json

Now GA: add the Codex plugin marketplace with codex plugin marketplace add ThreeMoonsLab/agents-shipgate, then invoke $agents-shipgate in a thread to run verify from Codex.

Claude Code

One flag wires the CLAUDE.md managed block, the skill, hooks, and a verify alias.

$ agents-shipgate init --workspace . --write \
  --claude-code

The hooks run a cheap trigger check after edits and the full verifier at Stop — Claude Code re-checks capability changes before reporting work complete. Inside Claude Code, agent mode auto-enables and a zero-flag agents-shipgate verify prints the full verifier artifact as JSON. CI remains the merge authority.

Definition of done for agent capability PRs

Before claiming completion on any PR that changes agent tools, MCP exports,
OpenAPI specs, prompts, permissions, policies, CI gates, or shipgate.yaml, run:

    agents-shipgate verify --base origin/main --head HEAD --format json

Read agents-shipgate-reports/agent-handoff.json first (gate.merge_verdict,
then controller); verifier.json is the authoritative controller substrate.
Do not claim completion when the merge verdict is blocked,
insufficient_evidence, or human_review_required unless the user explicitly
accepts the human review requirement.

Never weaken shipgate.yaml, Shipgate CI, AGENTS.md, skills, policy packs,
baselines, waivers, or suppressions merely to make Shipgate pass.

Product

Turn an agent PR into a capability-level merge verdict.

Agents Shipgate reads declared local tool sources and policy files, compares PR capability changes, runs deterministic checks, and writes verifier artifacts that tell reviewers whether the change can merge.

CompanyThree Moons Lab

ProductAgents Shipgate

CLI / package / repoagents-shipgate

Primary outputMerge verdict

Built for

AI engineersPR-time feedback on capability changes before review

Platform engineersDeterministic merge gates for agent repositories

Security & GRC teamsEvidence for agent capability approval and policy review

Inputs

MCP exports

OpenAPI specs

SDK/framework metadata

OpenAI Agents SDK · Anthropic Messages API · Google ADK · LangChain/LangGraph · CrewAI · OpenAI API · Codex config & plugins · n8n

agents-shipgate

static · local · deterministic

Outputs

verifier.json

pr-comment.md

GitHub verdict

report.json and SARIF stay available for deeper release review.

Agent buildersSee which tools, prompts, scopes, and policies changed in the PR.

Platform teamsGate generated agent changes without delegating release authority to the generator.

Security reviewersGet static merge evidence without running agents or importing user code.

Checks

What it checks before release.

The landing page shows the release-review categories; the full check catalog stays in the repo.

01approval

Approval gaps

Write, destructive, financial, or external communication tools without declared approval policies.

02surface

Wildcard tool sources

Wildcard MCP or inventory sources that expose an unreviewable tool surface.

03auth

Broad scopes

Manifest or tool permissions that rely on wildcard or overly broad authorization scopes.

04schema

Free-form action fields

Fields such as body, command, action, or updates that let the model control too much.

05bounds

Missing bounds

Unbounded arrays, objects, strings, or numeric fields on side-effecting operations.

06retry

Idempotency gaps

Write actions where retry behavior could duplicate refunds, updates, messages, or deletes.

07static

Dynamic surfaces

Framework toolsets that cannot be statically reviewed without explicit inventory evidence.

08review

Owner and policy evidence

Tools missing reviewer-friendly ownership, scope, approval, or prohibited-action coverage.

09baseline

Baseline drift

New, matched, and resolved findings when a reviewed baseline is present.

Merge verdict

A PR comment reviewers can act on.

Shipgate turns the PR diff into a verifier artifact, a reviewer-ready PR comment, and a release decision. Findings still include severity, evidence, source references, confidence, and next actions for deeper review.

agents-shipgate-reports / pr-comment.md

Agents Shipgate: blocked

Merge blocked

Verdict

blocked

Blockers

Review

required

Changed actions

Capability changes

#01 Critical openapi/billing.yaml:142

stripe.create_refund lacks a declared approval policy

evidence: financial_action · external_write · POST /refunds

recommend: Add approval policy or remove from this release.

confidence: high · check_id: SHIP-POLICY-APPROVAL-MISSING

#02 Critical openapi/billing.yaml:142

stripe.create_refund lacks idempotency evidence

evidence: write action · amount/currency/payment_id schema · retry behavior unknown

recommend: Add idempotency key or document retry policy.

confidence: high · check_id: SHIP-IDEMPOTENCY-MISSING

#03 High shipgate.yaml:18

wildcard_mcp_tools.* exposes an unreviewable tool surface

evidence: wildcard tool source in shipgate.yaml

recommend: Replace wildcard with explicit allowlist.

confidence: high · check_id: SHIP-SOURCE-WILDCARD

#04 High mcp_tools.json:#/tools/send_email

support.send_email accepts free-form 'body' field

evidence: external_communication · no template binding

recommend: Constrain to template IDs or require human confirmation.

confidence: medium · check_id: SHIP-SCHEMA-FREEFORM-ACTION

#05 Medium openapi/tickets.yaml:88

tickets.update is missing maximum bound on 'fields'

evidence: unbounded object · broad write

recommend: Document or enforce a field allowlist.

confidence: medium · check_id: SHIP-SCHEMA-MISSING-BOUND

verifier_schema_version 0.1 · report_schema_version 0.28 generated on PR verification

terminal

$ agents-shipgate verify --base origin/main --head HEAD --format json
evaluating PR trigger catalog ................. run
diff: origin/main...HEAD ...................... ok
changed files ................................. 6
running head scan ............................. done

Merge verdict: blocked
Blockers: 2  Review items: 1

Capability changes
+ stripe.create_refund · blocks release
~ shipgate.yaml · human review required
~ support.send_email · schema widened

→ wrote agents-shipgate-reports/verifier.json
→ wrote agents-shipgate-reports/report.json
→ wrote agents-shipgate-reports/pr-comment.md
→ exit 0  (advisory mode)

verifier.json

// agents-shipgate-reports/verifier.json — read top-down
{
  "verifier_schema_version": "0.1",
  "merge_verdict": "blocked",
  "can_merge_without_human": false,
  "capability_review": {
    "top_changes": [
      { "id": "stripe.create_refund", "impact": "blocks_release" }
    ],
    "trust_root_touched": true
  },
  "fix_task": { "actor": "human", "safe_to_attempt": false },
  "release_decision": { "decision": "blocked", "reason": "2 active findings block release." },
  "artifacts": { "verifier": "…/verifier.json", "report": "…/report.json", "pr_comment": "…/pr-comment.md" }
}

What drives the verdict

The verifier combines capability diffs with static release evidence.

New actions Approval gaps Broad scopes Prompt changes Schema bounds Idempotency gaps Trust-root edits Baseline changes

Merge verdictspassed · review · evidence · blocked

Primary artifactverifier.json

Reviewer artifactpr-comment.md

Gate signalrelease_decision.decision

Agent-native merge contract

A protocol your coding agent can act on — not just a report.

Underneath the verdict is one rule: release_decision.decision is the only gate, and no agent-facing field decides independently of it. Everything below is a deterministic projection of that one decision — eight contracts, each mapped to the artifact that already implements it.

In the published release The agent_controller projection, applicability, agents-shipgate attest, and the one-command agent protocol are all in the published v0.14.0 release — installed by pipx install agents-shipgate, no branch build required.

Contract	The question it answers	What the agent reads
Trigger	Should I run at all?	`first_next_action` · triggers.json
Capability change	What can the agent now do?	`capability_review.top_changes`
Merge verdict	May it merge?	`merge_verdict` · `can_merge_without_human`
Repair	What may be fixed, and by whom?	`fix_task` · `verification_command`
Forbidden action	What must never be done to pass?	`agent_controller.forbidden_actions`
Human authority	What only a human can grant?	`agent_controller.stop_reason` · `human_review`
Trust root	Can the judged weaken the judge?	`trust_root_touched` · `policy_weakened`
Attestation	What shipped, under which verdict?	`agents-shipgate attest` → attestation.json

The agent control loop

Those eight collapse into one block — verifier.json.agent_controller — that an autonomous agent switches on directly. completion_allowed is locked to can_merge_without_human, so step one can never contradict the gate.

completion_allowed

If true, the capability change is done — merge, and keep the verifier artifacts with the PR record.

must_stop

Else if true, surface stop_reason to a human. Never edit a forbidden_file_edits path or take a forbidden_actions shortcut to get past the gate.

fix_task

Otherwise apply the mechanical fix_task, re-run its verification_command, and read the fresh verdict.

All of it, in one command

A coding agent asks for a verdict and gets one stdout JSON object back — schema_version: "shipgate.codex_boundary_result/v1" — and switches on decision, completion_allowed, must_stop, first_next_action, human_review, repair, and policy (with policy_snapshot_sha256). The same verdict is served by shipgate.check on the optional read-only MCP server (agents-shipgate mcp-serve, five tools: shipgate.check, shipgate.preflight, shipgate.explain, shipgate.capabilities, shipgate.handoff).

$ shipgate check --agent claude-code --workspace . --format codex-boundary-json

Full field map: the agent-native merge contract in the repository.

Real-world examples

Proof from public tool surfaces, without turning the homepage into docs.

Four public examples show the verifier on realistic SDK/framework code and larger API surfaces. Each card keeps one representative capability issue visible and leaves full output in GitHub or docs.

OpenAI Agents SDK2 toolsHigh findings

Airline customer service agent

Static AST extraction finds a write-capable update_seat tool without enough release-review evidence.

Representative findingupdate_seat changes customer state and needs explicit scope and policy coverage.

Open source

Expand config excerpt

tool_sources:
  - id: openai_agents_sdk
    type: openai_agents_sdk
    path: main.py
environment:
  target: production_like

Anthropic Messages API3 toolsCritical

Cookbook customer service agent

A real published tool-use example includes cancel_order, a destructive action that needs approval evidence.

Representative findingcancel_order is destructive and ships without a declared approval policy.

Open source

Expand config excerpt

tool_sources:
  - id: anthropic_tools
    type: anthropic_messages
    path: tools.json
policies:
  approval_required_for: [destructive]

OpenAPI591 toolsStress test

DigitalOcean public API as agent tools

A cloud infrastructure API reframed as a broad agent surface exposes irreversible droplet, database, and Kubernetes operations.

Representative findingDestructive infrastructure actions are present without explicit approval policies.

Open source

Expand config excerpt

tool_sources:
  - id: digitalocean_openapi
    type: openapi
    path: openapi.yaml
permissions:
  scopes: ["*"]

OpenAPI167 toolsStress test

Twilio Messaging API purpose mismatch

A read-only manifest pointed at a messaging API still exposes DELETE-capable tools that contradict the declared purpose.

Representative findingRead-only release intent conflicts with message and phone-number deletion operations.

Open source

Expand config excerpt

agent:
  declared_purpose:
    - read messaging inventory
tool_sources:
  - id: twilio_openapi
    type: openapi
    path: messaging.yaml

Developer workflow

Verify locally, comment on PRs, then tighten when ready.

CI is advisory by default. Strict mode can fail after the team has reviewed the baseline, trust-root policy, and what must remain human-approved.

Modes Local verifier PR advisory Strict CI Human review

Local verifier

bash

$ pipx install agents-shipgate
$ agents-shipgate verify --preview --json
$ agents-shipgate init --workspace . --write --ci --agent-instructions=default --json
$ agents-shipgate verify --workspace . --config shipgate.yaml --base origin/main --head HEAD --ci-mode advisory --format json

Requires Python 3.12+. Use python -m pip install agents-shipgate if pipx is not available.

.github/workflows/shipgate.yml

yaml

name: Agents Shipgate

on:
  pull_request:

permissions:
  contents: read
  pull-requests: write

jobs:
  shipgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@<pinned-sha>
        with:
          fetch-depth: 0
      - uses: ThreeMoonsLab/agents-shipgate@v0.14.0
        with:
          config: shipgate.yaml
          ci_mode: advisory
          diff_base: target
          pr_comment: "true"
          output_dir: agents-shipgate-reports

The Action runs the verifier on pull-request diffs using the pinned v0.14.0 release.

Local verifier

Fast feedback while the PR is still being generated or edited.

PR comment

Post the merge verdict and capability delta without blocking while adopting.

Strict CI

Fail only after the team accepts the baseline and trust-root policy.

Human authority

Keep approval, idempotency, suppressions, and policy weakening out of auto-fix.

Trust model · open source

Built for generated PRs that need deterministic review.

Local-first and static by default. No agent execution, tool calls, LLM calls, or network access. It does not import user code, connect to MCP servers, or collect telemetry by default.

Designed for PR-time evidence before an AI-generated capability change lands. Open-source core. Transparent checks. Suppressions require reasons.

View source Read trust model Open an issue

Inspect the checks. Verify a PR. Open an issue with a false positive.

Default verifier guarantees

Static by default

No agent execution
No user-code import by default
No tool calls
No LLM calls
No MCP server connections
No verifier network calls by default
No verifier telemetry by default
Apache-2.0 open source

Differentiation

Not AI code review. Not evals. Not a gateway.

AI code review asks whether an implementation is correct. Shipgate asks whether a PR gave an agent new real-world capabilities, and whether that capability has release evidence.

Not

AI code review

It reviews implementation quality.

Shipgate reviews capability expansion.

Not

Evals

They test behavior.

Shipgate reviews release evidence.

Not

Security scanner

It finds code and dependency risks.

Shipgate asks what the agent can now do.

Shipgate

Merge verifier

A PR gate for agent capability changes.

Deterministic verdict. Evidence attached.

Side-by-side

Category	What it answers	What Shipgate answers
AI code review	Is this diff idiomatic, correct, or buggy?	Did this diff expand agent capability?
Code security scanning	Are dependencies or code patterns risky?	Are agent tools, scopes, schemas, and policies reviewable?
Evals	Did the model behave on curated inputs?	Can this tool/action surface merge?
Runtime gateway	Should this call be allowed now?	Should this capability be released at all?

Detailed comparisons: Agents Shipgate vs LLM evals · vs agent observability · vs MCP gateways · vs LLM gateways · vs runtime guardrails.

FAQ

Common merge-verdict questions.

Short answers for developers, platform teams, and security reviewers before trying the verifier.

What is agents-shipgate?

agents-shipgate is an open-source CLI and GitHub Action that verifies AI-generated agent capability changes and returns a deterministic merge verdict. The display name is Agents Shipgate; the package, CLI, repo, and action name are agents-shipgate.

What should trigger a Shipgate run?

Run it when a PR changes agent tools, MCP exports, OpenAPI specs, prompts, permissions, approval policies, confirmation policies, CI gates, or shipgate.yaml. Existing opted-in repos can run it on every PR.

What is the merge verdict?

The wire signal is still release_decision.decision. Public verdict labels are mergeable, human_review_required, insufficient_evidence, and blocked.

What does it actually check?

Agents Shipgate checks declared tools, schemas, scopes, approval policies, side effects, idempotency evidence, prompt/policy surfaces, trust-root files, baselines, and suppressions. It asks whether the capability change has enough release evidence to merge.

How is this different from LLM evals?

Evals validate behavior on inputs you wrote. Agents Shipgate validates the static release artifact — manifest, tool schemas, scopes, and policies — without running the model. Use both.

How is this different from observability or runtime guardrails?

Observability records what happened at runtime, and guardrails enforce access at runtime. Agents Shipgate runs earlier: it turns declared tool surfaces into static release-review evidence before promotion.

Does it call my agent or send my data anywhere?

No. The verifier is static by default: no agent execution, no user-code import by default, no tool calls, no LLM calls, no MCP server connections, no verifier network calls by default, and no verifier telemetry by default.

What inputs does it support?

It supports MCP exports, OpenAPI 3.x specs, OpenAI Agents SDK Python entrypoints, Anthropic Messages API artifacts, Google ADK Python and YAML config, LangChain/LangGraph, CrewAI, OpenAI API artifacts, Codex repo config (.codex/config.toml, hooks.json), Codex plugin packages, and n8n workflow artifacts.

What output formats does it write?

The PR verifier writes agents-shipgate-reports/verifier.json, report.json, and pr-comment.md. SARIF remains available through the GitHub Action or output format configuration.

What version does this site advertise?

This site advertises the published v0.14.0 release. Install from PyPI with pipx install agents-shipgate and pin GitHub Actions to ThreeMoonsLab/agents-shipgate@v0.14.0.

How do I add it to GitHub Actions?

Add ThreeMoonsLab/agents-shipgate@v0.14.0 to a pull-request workflow with fetch-depth: 0, diff_base: target, pr_comment: "true", and ci_mode: advisory.

Does it certify my agent as safe?

No. Agents Shipgate is not a safety certification, runtime gateway, or behavioral eval. It produces deterministic findings from tool definitions, schemas, scopes, and declared policies so release owners have evidence to review.

Which JSON field is the release gating signal?

Read release_decision.decision from report.json or the same block projected into verifier.json. Values are passed, blocked, review_required, and insufficient_evidence.

What may coding agents fix automatically?

They may install the workflow, add ignored local report directories, run verify, summarize artifacts, and apply high-confidence mechanical patches. They must not assert approval, confirmation, idempotency, broad-scope safety, prohibited-action enforcement, or runtime-trace proof without human review.

What is a release gate for AI agents?

A release gate is a deterministic CI check that runs on every pull request and fails the build when the agent's release artifact contains unsafe state. For AI agents specifically, the release gate inspects the manifest, tool surface declarations, scopes, and policies — not the model itself or runtime behavior. Agents Shipgate fits this slot.

What is a shipgate.yaml manifest?

shipgate.yaml is a checked-in YAML file that declares an agent's release context: tool sources (MCP, OpenAPI, SDK), permissions, approval and idempotency policies, risk overrides, and CI settings. Agents Shipgate reads it as the single source of truth for what should be reviewed at release time. The minimal valid manifest requires version, project, agent, and environment blocks plus at least one entry under tool_sources.

What is a Tool-Use Readiness Report?

A Tool-Use Readiness Report is the deterministic output of an agents-shipgate scan. It contains a release decision with passed, blocked, review_required, and insufficient_evidence, a finding list with severities and recommended remediation, and seven dimensions of evidence coverage. Written as Markdown for human review, JSON for tools and coding agents, and SARIF for GitHub code scanning.

What does "blast radius" mean for an AI agent?

Blast radius is the seventh dimension of tool-use readiness. It asks: if a tool fires unexpectedly, how bounded is the damage? Evidence includes a declared owner so the right team gets paged, an enumerated list of prohibited actions in agent.prohibited_actions, and resource-scope bounds (per-tenant, per-customer, per-resource-prefix). High-risk tools without these bounds get blast-radius findings.

How is this different from unit tests for my agent?

Unit tests verify specific behaviors on inputs the team wrote — the model returns the right tool call on this input, a helper processes this fixture correctly. Agents Shipgate verifies the static release artifact: which tools are exposed, what schemas they accept, what policies gate them. Unit tests cannot catch a write tool shipping without an approval policy; the release gate cannot catch a behavior regression on a curated test. Use both at different stages of CI.

Does Agents Shipgate work with LangChain, CrewAI, Google ADK, and other frameworks?

Yes. Agents Shipgate supports OpenAI Agents SDK, Anthropic Messages API, Google ADK (Python and YAML), LangChain/LangGraph, CrewAI, MCP exports, OpenAPI 3.x specs, OpenAI API artifacts, Codex plugins, and n8n workflows. Static AST extraction means no code import or execution. The seven dimensions of tool-use readiness apply regardless of framework — adapters normalize each framework's tool declarations into the same review surface.

How does Agents Shipgate fit into my CI/CD pipeline?

Add the GitHub Action to your pull-request workflow in advisory mode. It runs on every PR, scans the manifest plus tool sources, and posts findings as a PR comment without failing the build. Triage and baseline existing findings, then switch to strict mode (fail_on: critical,high) to block net-new gaps. Typical CI order: dependency install, release-readiness scan, unit tests, eval suite, deploy. See the full CI/CD tutorial.

When should I use strict mode vs advisory mode?

Use advisory mode when first adopting Agents Shipgate or when you have a backlog of findings to triage — the scan runs and reports but never fails the build. Switch to strict mode after you have saved a baseline (agents-shipgate baseline save) and reviewed the existing findings. Strict mode fails only on net-new findings above critical or high severity. The adoption path is: advisory, then review and baseline, then strict plus baseline.

What is the agent-native merge contract?

It is the protocol beneath the verdict: eight contracts — trigger, capability change, merge verdict, repair, forbidden action, human authority, trust root, and attestation — each mapped to the artifact that implements it. release_decision.decision is the only gate; every agent-facing field is a deterministic projection of it. A coding agent reads agent-handoff.json first (gate.merge_verdict, then controller); verifier.json.agent_controller is the authoritative substrate for whether it may continue, must repair, or must stop for a human. The named contract and the agent_controller projection are in the published v0.14.0 release. See the contract map.

What is agents-shipgate attest?

agents-shipgate attest derives a deterministic, local attestation from verifier.json and report.json: the base/head SHAs, the merge verdict, the capability delta, the declared human-acknowledgement state, a policy-snapshot hash, and content hashes of every verify artifact. It is content-addressed, does not gate, and leaves a durable record of what shipped under which verdict. Output: attestation.json. It is in the published v0.14.0 release, alongside registry ingest/registry query for a local capability-release ledger.

Can I install Agents Shipgate from Codex?

Yes. The Codex plugin is generally available. Run codex plugin marketplace add ThreeMoonsLab/agents-shipgate, then invoke $agents-shipgate in a thread to run verify, init, and report-reading workflows. The plugin supplies the Codex workflows; install or upgrade the CLI with pipx install agents-shipgate in the environment where Codex runs commands.

What is the one-command agent protocol?

shipgate check --agent codex|claude-code|cursor --workspace . --format codex-boundary-json is the canonical one-command path for a coding agent. It emits one stdout JSON object with schema_version: "shipgate.codex_boundary_result/v1"; the agent switches on decision, completion_allowed, must_stop, first_next_action, human_review, repair, and policy (which includes policy_snapshot_sha256). The former agent_result_v1 contract and its --format agent-json flag were removed in v0.14.0. The optional MCP server mode exposes the same verdict as a read-only shipgate.check tool that accepts caller-provided diff text.

How does Agents Shipgate work with Claude Code?

Run agents-shipgate init --workspace . --write --claude-code once. It writes the CLAUDE.md managed block, the auto-discoverable .claude/skills/agents-shipgate/ skill, a verify alias in Makefile or package.json scripts, and Claude Code hooks that run a cheap trigger check after edits and the full verifier at Stop. Inside Claude Code, agent mode auto-enables (the harness exports CLAUDECODE=1), so a zero-flag agents-shipgate verify prints the full verifier artifact as JSON on stdout. CI stays authoritative; the hooks are the local feedback loop.

What is host-grant drift detection?

agents-shipgate audit --host --save-baseline records the current coding-agent host grants — MCP servers, Claude Code permission rules and hooks, workflow scopes, Codex config presence — as an acknowledged baseline. audit --host --drift then deterministically diffs current grants against it, naming authority-broadening shapes such as a new or changed MCP server, a wildcard allow added, a deny or ask rule removed, or a workflow write scope gained. Advisory by default; --fail-on-drift exits 20 for scheduled CI gates. It catches authority changes that land outside PR review, where diff-time checks cannot see them.

Long-term direction

Start with PR merge verdicts. Add release evidence over time.

Today, Agents Shipgate makes AI-generated capability changes reviewable before merge. The next layers are baselines, suppressions, release history, policy drift, re-review triggers, and runtime evidence integrations.

01 · now

AI-generated PR verifier

CLI + GitHub Action + PR comment.

Release evidence

Reports, baselines, history, exceptions.

Runtime integrations

Trace evidence without replacing static review.

Get started

Make Shipgate part of your coding-agent definition of done.

$ agents-shipgate init --workspace . --write --ci --agent-instructions=default --json
$ agents-shipgate verify --base origin/main --head HEAD --format json

Install workflow

Have an AI-generated agent PR? Bring one and we'll review the capability delta, trust root, and merge verdict with your team.

Bring us one AI-generated PR help@threemoonslab.com

Your coding agent changed what your AI agent can do.Agents Shipgate tells you whether it can merge.

Sandbox controls can. Evals measure does. Agents Shipgate reviews should vs. can before merge.

Sandbox controls can.

Evals measure does.

Shipgate reviews should vs. can.

When a PR changes what an AI agent can do.

From capability change to merge verdict, in three steps.

A capability changes

Shipgate reads the diff

You get a verdict

One review surface for every tool source.

Install Shipgate where your coding agents already work.

Don’t let AI edit the rules that review it.

What coding agents know after install.

Codex

Claude Code

Definition of done for agent capability PRs

Turn an agent PR into a capability-level merge verdict.

What it checks before release.

Approval gaps

Wildcard tool sources

Broad scopes

Free-form action fields

Missing bounds

Idempotency gaps

Dynamic surfaces

Owner and policy evidence

Baseline drift

A PR comment reviewers can act on.

What drives the verdict

A protocol your coding agent can act on — not just a report.

completion_allowed

must_stop

fix_task

All of it, in one command

Proof from public tool surfaces, without turning the homepage into docs.

Airline customer service agent

Cookbook customer service agent

DigitalOcean public API as agent tools

Twilio Messaging API purpose mismatch

Verify locally, comment on PRs, then tighten when ready.

Local verifier

PR comment

Strict CI

Human authority

Built for generated PRs that need deterministic review.

Not AI code review. Not evals. Not a gateway.

Common merge-verdict questions.

Start with PR merge verdicts. Add release evidence over time.

AI-generated PR verifier

Release evidence

Runtime integrations

Make Shipgate part of your coding-agent definition of done.

Your coding agent changed what your AI agent can do.
Agents Shipgate tells you whether it can merge.