LLMs made generating code cheap. The real bottleneck is verification: checking that a change is correct, follows repo conventions, and is something you’d actually merge. The verification stack is a layered set of automated verifiers designed to catch different kinds of mistakes at different stages — so problems are caught early, cheaply, and without human intervention.Documentation Index
Fetch the complete documentation index at: https://allhandsai-add-verification-stack-docs.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
The verification stack consists of two layers today, with an architecture designed to support more:
Layer 1 — Trajectory-Level Verifier (Critic Model): A small, fast critic model that scores the agent’s trajectory before code is pushed. If the score falls below a confidence threshold, the agent’s work is gated — preventing obviously broken or off-track changes from ever reaching a pull request. See Enabling Layer 1 below.
Layer 2 — Repo-Level Verifier (ReviewBot): An automated code reviewer and QA agent that triggers on every pull request. It reviews the diff for correctness, security, and style, then optionally runs the software to verify behavior. See Enabling Layer 2 below.
Together, these layers form a pipeline: the critic prevents obviously broken work from being pushed, and the ReviewBot catches the subtler issues that require repository context.
How Effective Is It?
We’ve been running the verification stack on the OpenHands/software-agent-sdk repository for several months. Key findings:- Faster approvals — As ReviewBot adoption increased, time to first approval dropped significantly, with the largest gains on medium-to-large PRs.
- Improving accuracy — Bot review precision and recall have improved consistently over time. Human reviewers are generally more precise, but the bot catches issues humans miss — the two are complementary.
- Code quality maintained — Static analysis (radon, bandit, ruff) shows no degradation in cyclomatic complexity, security violations, or code smells for bot-reviewed PRs compared to human-only PRs.
- Test coverage improving — Since the ReviewBot was introduced, test coverage across the repository has trended upward.
- Review rounds decreasing — PRs with ReviewBot initially required more review rounds, but that gap has been closing as the skill improves.
Enabling Layer 1: Trajectory-Level Verifier
The trajectory-level verifier uses a critic model to evaluate agent work before code is pushed. It is currently available through:- OpenHands CLI — The critic is automatically enabled when using the OpenHands LLM Provider. See the Critic documentation for configuration options including iterative refinement thresholds.
- Software Agent SDK — Programmatic access via the SDK Critic Guide.
Enabling Layer 2: Repo-Level Verifier
The repo-level verifier consists of two components — a code review agent and a QA agent — both available as plugins in the OpenHands/extensions repository.Option A: GitHub Actions
Add the ReviewBot directly to your GitHub Actions workflow. This runs the code review (and optionally QA) as part of your CI pipeline on every PR.Create a bot account
Create a bot account under your GitHub organization (e.g.,
your-org-bot). This account will post review comments and approve/request changes on PRs. Grant it write access and the ability to approve PRs on your repository.Configure secrets
In your repository’s Settings → Secrets and variables → Actions, add:
LLM_API_KEY: Your LLM API keyBOT_GITHUB_TOKEN: A GitHub token for the bot account (if you want reviews posted by the bot rather than the defaultGITHUB_TOKEN)
Option B: OpenHands Automations (Beta)
The alternative is using OpenHands Automations, our event-triggered automation system. With Automations, you define the trigger once and it covers all repositories the bot account has access to — no per-repo workflow files needed.Create a bot account on OpenHands Cloud
Log in to OpenHands Cloud with your organization’s bot GitHub account.
Connect GitHub
Connect the bot account’s GitHub to OpenHands Cloud via the GitHub installation flow.
Closing the Loop: The Iterate Skill
Setting up the verification stack is only half the story. The other half is acting on it — reading CI results, parsing review comments, fixing code, pushing again, and repeating until everything is green. The iterate skill (OpenHands/extensions) turns the agent into an orchestration loop that drives a pull request from first push to merge-ready:- Push and open a draft PR — the PR starts as a draft to prevent premature automation triggers.
- Poll each verification layer — the agent checks CI, the ReviewBot’s verdict, and the QA agent’s report. It only polls layers that actually exist in the repo.
- Decide and act — if CI failed, it reads the logs and fixes the code. If the ReviewBot requested changes, it addresses the inline comments. If QA found regressions, it debugs and fixes.
- Push and re-poll — after every fix, the agent commits, pushes, re-requests review, and loops back. A push is never the end — the loop only exits when all present layers pass on the current SHA.
- Mark ready — once every verification layer is green, the agent converts the draft PR to ready for review.
/iterate in any OpenHands conversation with the skill loaded.
Related Resources
- Automated Code Review — Detailed setup for the code review agent (Layer 2)
- QA Changes — Detailed setup for the QA agent (Layer 2)
- Critic Documentation — Trajectory-level verifier configuration (Layer 1)
- OpenHands Automations — Event-triggered automation setup
- The Verification Stack (blog post) — Detailed effectiveness metrics and analysis
- Learning to Verify AI-Generated Code (blog post) — Layer 1 deep dive

