AI agent CI/CD: add a release gate to your GitHub Actions pipeline

Most teams adopting an AI agent release gate get stuck at the same point: the scanner finds 12 things on the first run, the team can’t clean them up in a sprint, and the gate quietly goes “advisory forever.” The result is a check that runs in CI without changing behavior — the worst version of a CI gate.

This post is the four-stage adoption pattern that gets you past “advisory forever” for an AI agent project. What to wire up at each stage, what YAML to commit, and what the right success criteria are.

The four stages:

Advisory mode — report on every PR, never fail the build.
Baseline — snapshot existing findings as accepted, so strict mode only fails on net-new gaps.
Strict mode — fail the build on net-new critical/high findings.
Governance — required check, SARIF upload, branch protection.

Each stage takes 1-3 days of part-time work. The whole adoption is typically two weeks plus a backlog burn-down between stages.

Why this belongs in CI specifically

The argument for putting agent release-readiness in CI alongside tests and SAST is covered in from CI/CD to agent release readiness — it’s the same pattern that brought SAST into the build pipeline. The short version: a tool surface change is a release artifact change, release artifact changes deserve PR-time review, and PR-time review needs to be deterministic and unblocking by default.

Stage 1: Advisory mode

The minimal workflow that gets you a PR comment with findings on every relevant change:

# .github/workflows/agents-shipgate.yml
name: Agent release readiness

on:
  pull_request:
    paths:
      - 'shipgate.yaml'
      - 'mcp-exports/**'
      - 'openapi/**'
      - 'agent/**'
      - '.github/workflows/agents-shipgate.yml'

permissions:
  contents: read
  pull-requests: write

jobs:
  release-readiness:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ThreeMoonsLab/agents-shipgate@v0.8.0
        with:
          config: shipgate.yaml
          ci_mode: advisory
          pr_comment: "true"

What this does: on every PR that touches the agent’s release artifact, the scanner runs, posts a PR comment with the finding list, and exits with status 0 regardless of findings. The team sees what’s there without the gate blocking merges.

What to watch:

The PR comment appears within ~30 seconds for most agents.
Findings have severities (critical/high/medium/low) and check IDs.
The team should triage the first run together so the categories are familiar — see walking a release-readiness report for what a real report looks like.

The paths: filter matters. Without it, the workflow runs on every PR (slow, noisy). With it, the gate runs only when something the gate could find is being changed.

Success criteria for stage 1: every PR touching the agent surface gets a PR comment. The team has read at least one report end-to-end and understands what each finding category means.

Duration: 1-3 days.

Stage 2: Baseline existing findings

If your first scan finds 12 things and your team can’t fix them all this sprint, you need a baseline — a checked-in snapshot of the findings you’ve explicitly accepted as not-blocking. Strict mode then only fails on net-new findings.

Generate the baseline locally and commit it:

agents-shipgate baseline save
# Writes .agents-shipgate/baseline.json based on shipgate.yaml
git add .agents-shipgate/baseline.json
git commit -m "shipgate: baseline current findings"

Each finding in the baseline is keyed by a fingerprint (check ID + tool name + evidence shape) so cosmetic changes don’t re-flag the finding, but a tool surface change does.

For each baselined finding, decide:

Defer and fix soon. Leave in the baseline; track in your backlog.
Won’t fix, the check doesn’t apply here. Move from the baseline to a suppression with a written reason:

# in shipgate.yaml
checks:
  ignore:
    - check_id: SHIP-DOC-MISSING-DESCRIPTION
      tool: legacy_search
      reason: tool deprecated 2026-Q2; deletion tracked in JIRA-1234

The baseline carries “we’ll fix later” findings; suppressions carry “this check doesn’t apply here” with a written justification. Both prevent strict mode from failing the build on a known finding; only suppressions express team intent in the manifest.

Success criteria for stage 2: a baseline file is committed; the team has triaged what’s in it (defer vs intentional vs to-fix-now); every intentional entry has a suppression with a reason:.

Duration: 1-7 days depending on backlog size.

Stage 3: Strict mode

With a baseline in place, switch the workflow to strict mode:

- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
  with:
    config: shipgate.yaml
    ci_mode: strict
    fail_on: critical,high
    baseline: .agents-shipgate/baseline.json
    pr_comment: "true"

Now the gate fails the build when:

A net-new finding at critical or high severity appears, AND
That finding’s fingerprint isn’t in the baseline.

Medium and low findings are still reported but don’t fail the build. Tune fail_on: based on team appetite — critical alone is the most conservative; critical,high,medium is most aggressive.

When a PR fails because of a net-new finding, the developer has three options:

Fix the finding. Preferred — the manifest gains a policy entry. See the AI agent deployment checklist for what each finding type wants.
Suppress with a reason. If the finding genuinely doesn’t apply (e.g., a tool flagged as destructive that’s actually read-only despite its name), add an entry to checks.ignore with a written justification.
Update the baseline. Should be rare and require code-owner review — moving a finding into the baseline says “this is acceptable tech debt,” and that decision deserves visibility.

Success criteria for stage 3: at least one PR fails on a net-new finding; the team agrees the failure was correct; the PR ships after a fix or a justified suppression.

Duration: 1-2 weeks before normalization. Expect some friction in the first week as the team learns which finding types are easy fixes (missing approval policy, missing owner) versus harder ones (idempotency evidence for a tool that wasn’t designed for retry).

Stage 4: Governance

Wire the release-readiness check into the broader release process.

Required check on protected branches

In GitHub Settings → Branches → Branch protection for main:

Require status checks to pass before merging.
Search for the release-readiness job name and select it as required.

Now no PR can merge without the gate passing. Combine with required review and you have a release-readiness check that doesn’t get bypassed by an enthusiastic merge.

SARIF upload to GitHub Security

The SARIF upload step needs an additional workflow-level permission beyond the Stage 1 set. Update the permissions: block at the top of the workflow:

permissions:
  contents: read
  pull-requests: write
  security-events: write   # required to upload SARIF
  # actions: read          # also required if the repo is private

Then add the upload step after the scan:

- uses: ThreeMoonsLab/agents-shipgate@v0.8.0
  id: shipgate
  with:
    config: shipgate.yaml
    ci_mode: strict
    fail_on: critical,high
    baseline: .agents-shipgate/baseline.json
    pr_comment: "true"
    upload_artifact: "true"

- name: Upload SARIF
  if: always()  # upload even when the scan fails the build
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: agents-shipgate-reports/report.sarif
    category: agents-shipgate

This puts findings in the GitHub Security tab alongside CodeQL, Dependabot, and any other SARIF-emitting scanners. Security and GRC reviewers can audit at the org level without leaving GitHub. The if: always() is important — without it, strict-mode failures skip the upload and you lose the audit trail for the exact PRs that mattered most.

Tool surface diff against a stored snapshot (preview, post-v0.8.0)

Tool surface diff is on agents-shipgate main and will land in the next release after v0.8.0. When you bump your pin, the GitHub Action will accept a diff_from: input pointing at a stored snapshot, so PRs that add a new MCP tool, remove one, or change a schema surface a dedicated tool-surface-diff section in the PR comment — the reviewer sees exactly which tools changed, not just which findings changed. Especially valuable when an MCP server minor-version bump adds a new tool you didn’t intend to ship. Skip this subsection while pinning to v0.8.0; come back when you’ve bumped the pin to a release that includes it.

Success criteria for stage 4: agents-shipgate is a required check; findings appear in the Security tab; tool surface diff is enabled where appropriate.

Duration: 1-2 days.

Common patterns

Multiple agents in one repo

If your repo has multiple agents (a customer-support agent and a billing agent, say), use a matrix:

strategy:
  fail-fast: false
  matrix:
    agent:
      - name: support
        config: agents/support/shipgate.yaml
      - name: billing
        config: agents/billing/shipgate.yaml

steps:
  - uses: actions/checkout@v4
  - uses: ThreeMoonsLab/agents-shipgate@v0.8.0
    with:
      config: ${{ matrix.agent.config }}
      ci_mode: strict
      fail_on: critical,high
      baseline: .agents-shipgate/${{ matrix.agent.name }}-baseline.json

Each agent gets its own baseline; one agent’s failure doesn’t block the other agent’s PR. fail-fast: false means a failure in support doesn’t cancel the billing scan.

Pre-commit hook for faster feedback

For local feedback before pushing, add a pre-commit hook:

# .pre-commit-config.yaml
- repo: local
  hooks:
    - id: agents-shipgate
      name: agents-shipgate scan
      entry: agents-shipgate scan -c shipgate.yaml --format json
      language: system
      pass_filenames: false
      files: ^(shipgate\.yaml|mcp-exports/.*|openapi/.*|agent/.*)$

This runs the scan whenever any of those files change, so developers catch trivial fixes before pushing. The pre-commit hook is a convenience, not a replacement for the CI gate — CI is the authoritative check because it runs on every PR regardless of developer environment.

Ordering with other checks

Typical CI for an agent project:

Step	Catches
Dependency install	nothing — setup
Lint / type-check	code regressions
agents-shipgate scan	tool surface, manifest, policy gaps
Unit tests	code behavior
Eval suite	model behavior on test cases
Build / package	distribution issues
Deploy	runtime configuration

The release-readiness scan runs early so the team isn’t paying for eval suite time before catching a manifest error. It’s also one of the cheapest checks — seconds, not minutes — because it’s static analysis with no model invocation.

What this doesn’t replace

Adding a CI release gate doesn’t replace:

LLM evals for behavior testing on curated inputs.
Agent observability for runtime evidence.
MCP gateways and LLM gateways for runtime enforcement.
Code review for everything else in the PR that isn’t agent-surface.

Each guard catches something the others can’t. For the full picture of how these fit together, see why evals are not release gates.

Quick reference

Copy-paste the whole workflow:

# .github/workflows/agents-shipgate.yml
name: Agent release readiness

on:
  pull_request:
    paths:
      - 'shipgate.yaml'
      - 'mcp-exports/**'
      - 'openapi/**'
      - 'agent/**'
      - '.github/workflows/agents-shipgate.yml'

permissions:
  contents: read
  pull-requests: write
  # security-events: write   # uncomment if you add SARIF upload (stage 4)
  # actions: read            # also required for SARIF upload on private repos

jobs:
  release-readiness:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ThreeMoonsLab/agents-shipgate@v0.8.0
        with:
          config: shipgate.yaml
          ci_mode: advisory             # → strict (stage 3)
          # fail_on: critical,high      # → uncomment with strict
          # baseline: .agents-shipgate/baseline.json
          pr_comment: "true"
          upload_artifact: "true"

Start with the above; uncomment the strict-mode lines after Stage 2, and the security-events: write permission if you add SARIF upload in Stage 4.

The full check catalog lists every check the scanner can emit. The AI agent deployment checklist covers the 18 questions each finding maps onto. The GitHub Action quickstart covers more workflow-level options (annotations, custom output directories, multi-config setups).

This is what AI agent CI/CD looks like in 2026 — same shape as SAST and type-checking for traditional services, different artifacts in the gate slot.