Skip to main content

Documentation Index

Fetch the complete documentation index at: https://allhandsai-add-verification-stack-docs.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

View QA Changes Plugin

Check out the complete QA changes plugin with ready-to-use code and configuration.
Automated QA testing goes beyond code review and CI: instead of reading diffs or running the test suite, the QA agent actually runs the software and verifies that changes work as claimed. It sets up the environment, exercises changed behavior as a real user would (browser, CLI, API requests), and posts a structured report with evidence. This is Layer 2 of the Verification Stack, complementing the code review agent.

Overview

The QA agent follows a four-phase methodology:
  1. Understand — Reads the PR diff, title, and description. Classifies changes (new feature, bug fix, refactor, config) and identifies entry points (CLI commands, API endpoints, UI pages).
  2. Setup — Bootstraps the repository: installs dependencies, builds the project, notes CI status.
  3. Exercise — The core phase: spins up servers, opens browsers, runs CLI commands, makes HTTP requests — testing the changed behavior as a real user would. For bug fixes, it reproduces the bug on the base branch and verifies the fix on the PR branch.
  4. Report — Posts a structured QA report as a PR comment, with evidence (commands run, outputs, screenshots) and a verdict (PASS / FAIL / PARTIAL).
The QA agent knows when to give up: if an approach fails after three materially different attempts, it switches strategy. If two fundamentally different strategies fail, it reports what it tried and stops — rather than spinning endlessly.

What It Does (and Doesn’t)

QA Agent Does

  • Run the actual application and interact with it
  • Make real HTTP requests, run real CLI commands
  • Open browsers and verify UI changes
  • Reproduce bugs and verify fixes end-to-end
  • Report with evidence (commands, outputs, screenshots)

QA Agent Does NOT

  • Run the test suite (that’s CI’s job)
  • Analyze code for style or structure (that’s code review’s job)
  • Run linters, formatters, or type checkers
  • Substitute --help or --dry-run for real execution

Quick Start

GitHub Actions

Create .github/workflows/qa-changes.yml in your repository:
name: QA Changes

on:
  pull_request:
    types: [opened, ready_for_review, labeled]

permissions:
  contents: read
  pull-requests: write
  issues: write

jobs:
  qa:
    if: |
      (github.event.action == 'opened' && github.event.pull_request.draft == false) ||
      github.event.action == 'ready_for_review' ||
      github.event.label.name == 'qa-this'
    runs-on: ubuntu-latest
    steps:
      - name: Run QA Changes
        uses: OpenHands/extensions/plugins/qa-changes@main
        with:
          llm-model: anthropic/claude-sonnet-4-5-20250929
          llm-api-key: ${{ secrets.LLM_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
Add your LLM_API_KEY to your repository’s Settings → Secrets and variables → Actions.

In a Conversation

You can also trigger QA manually in any OpenHands conversation by invoking the skill:
/qa-changes
The agent will ask for the PR to test, or you can provide context directly:
/qa-changes — Please QA PR #42 on the my-org/my-repo repository.
Focus on the new dashboard page and verify it renders correctly.

QA Report Format

The QA agent posts a structured report as a PR comment:
## QA Report

**Status: PASS** ✅

### Changes Tested
- New `/api/health` endpoint returns 200 with version info
- Dashboard page renders at `/dashboard` with correct data

### Evidence
1. Started server with `npm run dev`
2. `curl http://localhost:3000/api/health` → 200 OK, body: {"status":"ok","version":"1.2.0"}
3. Navigated to http://localhost:3000/dashboard — page renders correctly
   [screenshot attached]

### Edge Cases
- Empty database state: dashboard shows "No data" placeholder ✅
- Invalid auth token: returns 401 as expected ✅

Customization

Change Types

The QA agent adapts its approach based on the type of change:
Change TypeQA Approach
Frontend / UIStarts dev server, opens browser, verifies visual changes, tests interactions
CLIRuns commands with realistic arguments, verifies output, tests edge cases
API / BackendStarts server, makes HTTP requests, verifies responses and side effects
Bug fixReproduces bug on base branch, verifies fix on PR branch (before/after)
Library / SDKWrites and runs a short script that imports and calls changed functions

Repository-Specific QA Guidelines

Add repo-specific QA instructions by creating .agents/skills/custom-qa-guide.md:
---
name: custom-qa-guide
description: Custom QA guidelines for this repository
triggers:
- /qa-changes
---

# QA Guidelines for [Your Project]

## Environment Setup
- Run `make setup` to initialize the development environment
- The dev server runs on port 8080

## Key Test Scenarios
- Always verify the admin dashboard at /admin after backend changes
- For API changes, test with both authenticated and unauthenticated requests

## Known Limitations
- The payment module requires a Stripe test key — skip payment flow testing

Integration with the Verification Stack

The QA agent is most powerful when used alongside the code review agent and the iterate skill as part of the full Verification Stack:
  1. Code review catches issues by reading the diff (style, security, data structures)
  2. QA catches issues by running the software (behavioral regressions, UI bugs)
  3. Iterate orchestrates the loop — fixing issues flagged by either verifier and re-polling until the PR is clean

Troubleshooting

Ensure your repository’s setup instructions are documented in README.md or AGENTS.md. The agent follows these to bootstrap the environment. If setup requires special steps, add them to a custom QA guide.
PARTIAL means some scenarios passed and others failed or couldn’t be tested. Read the report details — it will explain what worked and what didn’t. Common causes: missing environment variables, external service dependencies, or insufficient permissions.
For large PRs with many changed entry points, the agent may need more time. Consider splitting large PRs into smaller, focused changes. You can also add a custom QA guide that prioritizes the most important scenarios.