AI agent CI/CD: add a release gate to your GitHub Actions pipeline
Adding agents-shipgate to your GitHub Actions workflow in four stages: advisory mode, baseline, strict mode, governance. Real YAML for each step.
Most teams adopting an AI agent release gate get stuck at the same point: the scanner finds 12 things on the first run, the team can’t clean them up in a sprint, and the gate quietly goes “advisory forever.” The result is a check that runs in CI without changing behavior — the worst version of a CI gate.
This post is the four-stage adoption pattern that gets you past “advisory forever” for an AI agent project. What to wire up at each stage, what YAML to commit, and what the right success criteria are.
The four stages:
- Advisory mode — report on every PR, never fail the build.
- Baseline — snapshot existing findings as accepted, so strict mode only fails on net-new gaps.
- Strict mode — fail the build on net-new critical/high findings.
- Governance — required check, SARIF upload, branch protection.
Each stage takes 1-3 days of part-time work. The whole adoption is typically two weeks plus a backlog burn-down between stages.
Why this belongs in CI specifically
The argument for putting agent release-readiness in CI alongside tests and SAST is covered in from CI/CD to agent release readiness — it’s the same pattern that brought SAST into the build pipeline. The short version: a tool surface change is a release artifact change, release artifact changes deserve PR-time review, and PR-time review needs to be deterministic and unblocking by default.
Stage 1: Advisory mode
The minimal workflow that gets you a PR comment with findings on every relevant change:
# .github/workflows/agents-shipgate.yml
name: Agent release readiness
on:
pull_request:
paths:
- 'shipgate.yaml'
- 'mcp-exports/**'
- 'openapi/**'
- 'agent/**'
- '.github/workflows/agents-shipgate.yml'
permissions:
contents: read
pull-requests: write
jobs:
release-readiness:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
with:
config: shipgate.yaml
ci_mode: advisory
pr_comment: "true"
What this does: on every PR that touches the agent’s release artifact, the scanner runs, posts a PR comment with the finding list, and exits with status 0 regardless of findings. The team sees what’s there without the gate blocking merges.
What to watch:
- The PR comment appears within ~30 seconds for most agents.
- Findings have severities (critical/high/medium/low) and check IDs.
- The team should triage the first run together so the categories are familiar — see walking a release-readiness report for what a real report looks like.
The paths: filter matters. Without it, the workflow runs on every
PR (slow, noisy). With it, the gate runs only when something the
gate could find is being changed.
Success criteria for stage 1: every PR touching the agent surface gets a PR comment. The team has read at least one report end-to-end and understands what each finding category means.
Duration: 1-3 days.
Stage 2: Baseline existing findings
If your first scan finds 12 things and your team can’t fix them all this sprint, you need a baseline — a checked-in snapshot of the findings you’ve explicitly accepted as not-blocking. Strict mode then only fails on net-new findings.
Generate the baseline locally and commit it:
agents-shipgate baseline save
# Writes .agents-shipgate/baseline.json based on shipgate.yaml
git add .agents-shipgate/baseline.json
git commit -m "shipgate: baseline current findings"
Each finding in the baseline is keyed by a fingerprint (check ID + tool name + evidence shape) so cosmetic changes don’t re-flag the finding, but a tool surface change does.
For each baselined finding, decide:
- Defer and fix soon. Leave in the baseline; track in your backlog.
- Won’t fix, the check doesn’t apply here. Move from the baseline to a suppression with a written reason:
# in shipgate.yaml
checks:
ignore:
- check_id: SHIP-DOC-MISSING-DESCRIPTION
tool: legacy_search
reason: tool deprecated 2026-Q2; deletion tracked in JIRA-1234
The baseline carries “we’ll fix later” findings; suppressions carry “this check doesn’t apply here” with a written justification. Both prevent strict mode from failing the build on a known finding; only suppressions express team intent in the manifest.
Success criteria for stage 2: a baseline file is committed; the
team has triaged what’s in it (defer vs intentional vs to-fix-now);
every intentional entry has a suppression with a reason:.
Duration: 1-7 days depending on backlog size.
Stage 3: Strict mode
With a baseline in place, switch the workflow to strict mode:
- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
with:
config: shipgate.yaml
ci_mode: strict
fail_on: critical,high
baseline: .agents-shipgate/baseline.json
pr_comment: "true"
Now the gate fails the build when:
- A net-new finding at critical or high severity appears, AND
- That finding’s fingerprint isn’t in the baseline.
Medium and low findings are still reported but don’t fail the build.
Tune fail_on: based on team appetite — critical alone is the
most conservative; critical,high,medium is most aggressive.
When a PR fails because of a net-new finding, the developer has three options:
- Fix the finding. Preferred — the manifest gains a policy entry. See the AI agent deployment checklist for what each finding type wants.
- Suppress with a reason. If the finding genuinely doesn’t
apply (e.g., a tool flagged as destructive that’s actually
read-only despite its name), add an entry to
checks.ignorewith a written justification. - Update the baseline. Should be rare and require code-owner review — moving a finding into the baseline says “this is acceptable tech debt,” and that decision deserves visibility.
Success criteria for stage 3: at least one PR fails on a net-new finding; the team agrees the failure was correct; the PR ships after a fix or a justified suppression.
Duration: 1-2 weeks before normalization. Expect some friction in the first week as the team learns which finding types are easy fixes (missing approval policy, missing owner) versus harder ones (idempotency evidence for a tool that wasn’t designed for retry).
Stage 4: Governance
Wire the release-readiness check into the broader release process.
Required check on protected branches
In GitHub Settings → Branches → Branch protection for main:
- Require status checks to pass before merging.
- Search for the
release-readinessjob name and select it as required.
Now no PR can merge without the gate passing. Combine with required review and you have a release-readiness check that doesn’t get bypassed by an enthusiastic merge.
SARIF upload to GitHub Security
The SARIF upload step needs an additional workflow-level permission
beyond the Stage 1 set. Update the permissions: block at the top
of the workflow:
permissions:
contents: read
pull-requests: write
security-events: write # required to upload SARIF
# actions: read # also required if the repo is private
Then add the upload step after the scan:
- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
id: shipgate
with:
config: shipgate.yaml
ci_mode: strict
fail_on: critical,high
baseline: .agents-shipgate/baseline.json
pr_comment: "true"
upload_artifact: "true"
- name: Upload SARIF
if: always() # upload even when the scan fails the build
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: agents-shipgate-reports/report.sarif
category: agents-shipgate
This puts findings in the GitHub Security tab alongside CodeQL,
Dependabot, and any other SARIF-emitting
scanners. Security and GRC reviewers can audit at the org level
without leaving GitHub. The if: always() is important — without it,
strict-mode failures skip the upload and you lose the audit trail
for the exact PRs that mattered most.
Tool surface diff against a stored snapshot (preview, post-v0.8.0)
Tool surface diff is on
agents-shipgate main and will land in the next release after
v0.8.0. When you bump your pin, the GitHub Action will accept a
diff_from: input pointing at a stored snapshot, so PRs that add a
new MCP tool, remove one, or change a schema surface a dedicated
tool-surface-diff section in the PR comment — the reviewer sees
exactly which tools changed, not just which findings changed.
Especially valuable when an MCP server minor-version bump adds a
new tool you didn’t intend to ship. Skip this subsection while
pinning to v0.8.0; come back when you’ve bumped the pin to a
release that includes it.
Success criteria for stage 4: agents-shipgate is a required check; findings appear in the Security tab; tool surface diff is enabled where appropriate.
Duration: 1-2 days.
Common patterns
Multiple agents in one repo
If your repo has multiple agents (a customer-support agent and a billing agent, say), use a matrix:
strategy:
fail-fast: false
matrix:
agent:
- name: support
config: agents/support/shipgate.yaml
- name: billing
config: agents/billing/shipgate.yaml
steps:
- uses: actions/checkout@v4
- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
with:
config: ${{ matrix.agent.config }}
ci_mode: strict
fail_on: critical,high
baseline: .agents-shipgate/${{ matrix.agent.name }}-baseline.json
Each agent gets its own baseline; one agent’s failure doesn’t block
the other agent’s PR. fail-fast: false means a failure in support
doesn’t cancel the billing scan.
Pre-commit hook for faster feedback
For local feedback before pushing, add a pre-commit hook:
# .pre-commit-config.yaml
- repo: local
hooks:
- id: agents-shipgate
name: agents-shipgate scan
entry: agents-shipgate scan -c shipgate.yaml --format json
language: system
pass_filenames: false
files: ^(shipgate\.yaml|mcp-exports/.*|openapi/.*|agent/.*)$
This runs the scan whenever any of those files change, so developers catch trivial fixes before pushing. The pre-commit hook is a convenience, not a replacement for the CI gate — CI is the authoritative check because it runs on every PR regardless of developer environment.
Ordering with other checks
Typical CI for an agent project:
| Step | Catches |
|---|---|
| Dependency install | nothing — setup |
| Lint / type-check | code regressions |
| agents-shipgate scan | tool surface, manifest, policy gaps |
| Unit tests | code behavior |
| Eval suite | model behavior on test cases |
| Build / package | distribution issues |
| Deploy | runtime configuration |
The release-readiness scan runs early so the team isn’t paying for eval suite time before catching a manifest error. It’s also one of the cheapest checks — seconds, not minutes — because it’s static analysis with no model invocation.
What this doesn’t replace
Adding a CI release gate doesn’t replace:
- LLM evals for behavior testing on curated inputs.
- Agent observability for runtime evidence.
- MCP gateways and LLM gateways for runtime enforcement.
- Code review for everything else in the PR that isn’t agent-surface.
Each guard catches something the others can’t. For the full picture of how these fit together, see why evals are not release gates.
Quick reference
Copy-paste the whole workflow:
# .github/workflows/agents-shipgate.yml
name: Agent release readiness
on:
pull_request:
paths:
- 'shipgate.yaml'
- 'mcp-exports/**'
- 'openapi/**'
- 'agent/**'
- '.github/workflows/agents-shipgate.yml'
permissions:
contents: read
pull-requests: write
# security-events: write # uncomment if you add SARIF upload (stage 4)
# actions: read # also required for SARIF upload on private repos
jobs:
release-readiness:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
with:
config: shipgate.yaml
ci_mode: advisory # → strict (stage 3)
# fail_on: critical,high # → uncomment with strict
# baseline: .agents-shipgate/baseline.json
pr_comment: "true"
upload_artifact: "true"
Start with the above; uncomment the strict-mode lines after Stage 2,
and the security-events: write permission if you add SARIF upload
in Stage 4.
The full check catalog lists every check the scanner can emit. The AI agent deployment checklist covers the 18 questions each finding maps onto. The GitHub Action quickstart covers more workflow-level options (annotations, custom output directories, multi-config setups).
This is what AI agent CI/CD looks like in 2026 — same shape as SAST and type-checking for traditional services, different artifacts in the gate slot.