Home

What is Three Moons Lab?

Three Moons Lab builds merge-verdict infrastructure for teams reviewing AI-generated agent capability changes.

What we build

The first Three Moons Lab product is Agents Shipgate, an open-source CLI and GitHub Action that verifies AI-generated agent capability changes and returns deterministic merge verdicts before agent code lands.

The package, CLI, repository, and GitHub Action are named agents-shipgate. Agents Shipgate reads a checked-in shipgate.yaml manifest plus local tool sources such as MCP exports, OpenAPI specs, and supported SDK/framework metadata.

The problem

Once an agent can refund, email, cancel, deploy, or modify records, every tool-surface change becomes a release event. Evals test behavior, observability records runtime activity, and gateways enforce access at call time. Release owners still need a deterministic pre-merge answer: did this PR expand capability, and can it merge?

How we are different

Agents Shipgate is static by default. It does not run agents, call tools, invoke LLMs, connect to MCP servers, import user code by default, make verifier network calls, or collect verifier telemetry by default. It produces local verifier, PR comment, JSON, and SARIF evidence for human release review.

Healthcare for agents

Agents Shipgate is the first instalment of a longer thesis we call healthcare for agents: tool-using AI agents need a portfolio of pre-deployment and ongoing health checks, not a single eval pass. A deterministic PR merge verifier today; reviewable baselines, capability audits, and policy-drift detection next.

Concretely, the agent lifecycle readiness slot we work in looks like this:

  • Release readiness — static review of the tool surface being promoted. Shipped today by Agents Shipgate.
  • Baselines — snapshots of reviewed findings so strict CI only fails on net-new gaps. Shipped.
  • Capability audits — periodic reviews of what an agent can actually do across its declared surface. In design.
  • Policy drift — detection when production behavior diverges from the reviewed manifest. In design.
  • Lifecycle retros — post-incident structured review of which layer (release, runtime, model) failed. Planned.

Most of the AI agent stack — evals, observability, runtime guardrails — focuses on the model and the request. The release artifact has historically had no named slot. agents-shipgate names it; the longer agent governance roadmap fills out the rest.

For the full thesis behind this framing, read Healthcare for agents on the blog.

Where to start

If you ship a tool-using agent today, the right entry point is Agents Shipgate's quickstart. If you are mapping the broader category, the glossary collects the canonical vocabulary we use across docs, blog, and reports.

Agents Shipgate Glossary