mk-qa-master · AI 測試大師 — let's make QA simple

Scope

What this is not

mk-qa-master sits between your AI client and your test framework. It's not the framework, the LLM, a CI runner, a source analyzer, or a SaaS UI.

A test framework

→ Bring pytest / Jest / Cypress / Go test / Maestro — qa-master drives them

An LLM

→ Reasoning lives in your AI client (Claude / Cursor / Codex / Gemini). qa-master only exposes tools

A CI runner

→ Runs locally + reports JUnit XML / HTML. Wire JUnit into GitHub Actions / Jenkins / GitLab yourself

A source-code analyzer

→ Analyzes the live DOM (web) / view hierarchy (mobile), not your repo source

A SaaS dashboard

→ MCP-native: lives in your AI client. The HTML report is one self-contained file

What it does

Three jobs. One server. Web + mobile.

Ranked by how often you actually use them.

▶

Run tests, web or mobile

Switch runners with a single QA_RUNNER env var: pytest / Jest / Cypress / Go for web, maestro for iOS Simulator, Android Emulator, real devices, and BlueStacks. Auto-retry, JUnit XML, screenshots, Playwright trace.zip / Maestro recordings — out of the box.

✎

Write tests from a URL or a screen

analyze_url probes the DOM; analyze_screen dumps the live mobile hierarchy. Both surface form / cta / nav / tab-bar modules with real selectors, then generate_test emits runnable pytest or Maestro YAML — not # TODO placeholders.

⚡

Your QA optimization advisor

Every run archives a snapshot and writes a new optimization-plan.md. Flaky vs. broken vs. slow-regression — ranked by evidence, not by gut. Same loop works for web and mobile.

The pipeline

A self-correcting loop

Every run feeds the optimizer; the optimizer points at the weakest link; the next run attacks it first. Without this loop, AI is just a faster monkey tester.

Analyze

analyze_url / analyze_screen

Probe the DOM (web) or live mobile hierarchy → form / cta / nav modules + selectors.

Generate

generate_test / auto_generate_tests

Emit runnable pytest .py or Maestro .yaml against the detected modules — not # TODO.

Run

run_tests / run_failed

Drive the native runner, capture JUnit XML, screenshots, trace.zip / Maestro recordings.

Report

get_test_report / get_failure_details

Outcome strings + error signatures + history snapshot. Feeds the advisor next.

Advise

get_optimization_plan

Three lenses (suite / MCP / AI) → ranked next-run action list. The loop closes here.

↻ next run attacks the weakest link

QA knowledge in three layers

Domain context, not just DOM

A DOM-only analyzer produces 'empty field should error' — monkey testing in a new wrapper. We layer real QA knowledge on top.

Built-in

ISTQB's seven principles, equivalence partitioning, decision tables, state transitions, the test pyramid, shift-left, mobile checklists, QA metrics — baked into the server.

Your file

Drop a qa-knowledge.md in your project root: business rules, historical bugs, standard assertion copy, user journeys, technical constraints. Run init_qa_knowledge to scaffold one.

Per-test inline

Pass a business_context slice into generate_test; it gets printed as a # Business context: block inside the test, so reviewers see why without leaving the file.

The advisor

Three lenses on every run

After each run, the advisor reads history/ and telemetry, then writes a ranked action list. Three perspectives:

Suite quality

Per-test outcome strings like PFPFP feed a flake score. Cross-reference error signatures: three consecutive fails with the same signature → marked broken (a real bug, not flake).

MCP usability

Tool telemetry surfaces top tools, error rate, repeated args, and common A→B chains. Tells you where to ship a meta-tool or cache.

AI effectiveness

Did the test generate_test wrote show up in the next run? Did the modules analyze_url detected get matching test files? Adoption rate vs. coverage gap — tracked.

Runners

7 runners, one tool surface

Switch via the QA_RUNNER env var. Seven frameworks, one MCP surface — web on four, mobile on Maestro, API on Schemathesis (OpenAPI / Swagger, since v0.6.0) or Newman (Postman collections, since v0.6.1). Pre-existing API tests in pytest + httpx / Jest + supertest / Cypress cy.request() / Go httptest still ride their respective runners — no migration. Pact provider verification on the v0.7.0 roadmap.

pytest-playwright

env: QA_RUNNER=pytest
since 0.1.0

jest

env: QA_RUNNER=jest
since 0.2.0

cypress

env: QA_RUNNER=cypress
since 0.2.0

go test

env: QA_RUNNER=go
since 0.2.0

maestro (iOS + Android + BlueStacks)

env: QA_RUNNER=maestro (+ optional QA_ANDROID_HOST for BlueStacks)
since 0.3.0

schemathesis (OpenAPI / Swagger)

env: QA_RUNNER=schemathesis + QA_OPENAPI_URL (http(s):// or file://); install with mk-qa-master[api]
since 0.6.0

newman (Postman collections)

env: QA_RUNNER=newman + QA_POSTMAN_COLLECTION (path); system prereq: npm install -g newman
since 0.6.1

Tool surface

16 tools across 5 roles

Grouped by role. Each group is one layer in the analyze → generate → run → report → advise loop. README's prompting cookbook has natural-language phrasings — you rarely name a tool yourself.

Discover — orientation + scan

get_runner_info — Which runner is active + all available. Call this first so the AI picks the right test template (Playwright .py vs Maestro .yaml).
list_tests — Enumerate every collectable test under the active runner — pytest --collect-only, jest --listTests, cypress glob, go -list, maestro YAML walk.
analyze_url — Web: probe a live URL — form / nav / dialog / cta modules + selectors + API endpoints the page hits + layout-overflow warnings + candidate TCs.
analyze_screen — Mobile: dump maestro hierarchy → form / cta / tab_bar modules + candidate TCs, noise-filtered (status bar + asset names stripped).

Generate — modules → runnable tests

generate_test — Test skeleton; with module from analyze_url/analyze_screen, a *runnable* Playwright .py or Maestro .yaml with concrete selectors — not # TODO stubs.
auto_generate_tests — One-shot: analyze_url → generate_test per module. Hand it a URL, get a tests/ folder back.
codegen — Launch Playwright codegen interactively (web) / hint to maestro studio (mobile). Good for baseline happy-path recording.
init_qa_knowledge — Scaffold qa-knowledge.md in the project root — business rules / past bugs / standard assertions / user journeys / technical constraints.
get_qa_context — Read qa-knowledge.md (built-in ISTQB fallback). Feed a slice into generate_test.business_context for domain-aware tests.

Run — execute the suite

run_tests — Execute under the active runner; writes report.json + JUnit XML, snapshots into history/, auto-refreshes optimization-plan.md. Optional filter.
run_failed — Re-run only last failures — pytest --lf, jest --onlyFailures, cypress/go reverse-lookup, maestro nodeid → .yaml. Way faster than re-running the suite.

Report — read what just happened

get_test_report — Summary: passed / failed / skipped / flaky_in_run / duration. Cheap — use it between actions instead of re-running.
get_failure_details — Per-failure message + screenshot + Playwright trace.zip + video paths + parsed step sequence. The 「why did it fail」 tool.
generate_html_report — Render the latest run as one self-contained HTML — base64 screenshots, trend sparkline, collapsed Passed, expanded Failed cards. Slack-able.
get_test_history — Last N archived run summaries — flake / duration regression / pass-rate trend. Pair with get_optimization_plan for action items.

Advisor — the self-improvement coach

get_optimization_plan — Three-lens prioritized plan: suite quality (flake / broken / slow_regression) + MCP usability (top tools, repeat args, error rate) + AI effectiveness (generate_test adoption, coverage gaps). Writes optimization-plan.md every run.

Workflows

Four prompts cover ~90% of real use

One sentence to the AI client; the tools chain automatically.

1.

"Test https://your-site/login — analyze the page, write tests for every module, run them, then tell me what to fix."

analyze_url → generate_test (×N modules) → run_tests → get_failure_details → get_optimization_plan

2.

"I just added three new feature pages — auto-generate tests for everything the analyzer finds and run them."

auto_generate_tests(url=...) → run_tests → get_test_report → get_optimization_plan

3.

"What's wrong with my test suite this week — give me a ranked plan, not gut feel."

get_test_history(limit=30) → get_optimization_plan(history_limit=30, telemetry_limit=2000)

4.

"Test the barcode button on my mobile app on the iOS Simulator and tell me if it's flaky."

analyze_screen(app_id='com.example.app', launch_app=true) → generate_test(module=<cta>) → run_tests → get_optimization_plan

Sample output

What you actually get

Same shape as spec-master's plan — markdown, ready to paste into Slack / JIRA / a sprint planning doc. Auto-written after every run.

get_optimization_plan

# Optimization Plan — 2026-05-12T14:03:40

_Based on 6 archived runs._

## Prioritized Actions

### 1. 🔴 HIGH — flaky
- **Target**: `tests/test_login.py::test_invalid_credentials`
- **Evidence**: flake_score=0.4, outcomes=PFPFP, rerun_count=1
- **Suggestion**: 加 explicit wait (wait_for_response / locator wait)
- **auto_action_hint**: `get_failure_details(test_id="test_invalid_credentials")`

### 2. 🟡 MEDIUM — coverage_gap
- **Target**: `register_form` (module detected on /register)
- **Evidence**: analyze_url found this module; no matching test_*.py in repo
- **Suggestion**: `generate_test(description="...", filename="test_register_form.py")`

### 3. 🟡 MEDIUM — slow_regression
- **Target**: `tests/test_checkout.py::test_full_flow`
- **Evidence**: median duration 1.8× baseline across last 6 runs
- **Suggestion**: profile network waits; pin fixture data; consider parallel mark

## MCP usability
- Top tool: `run_tests` (38%) · `analyze_url` (22%) · `get_failure_details` (14%)
- Common chain: `analyze_url → generate_test` (17 occurrences)
- Error rate: 2.3% (1 timeout in analyze_url against slow staging)

## AI effectiveness
- generate_test adoption: 9 / 11 generated tests appeared in the next run (82%)
- coverage gap: 1 module from analyze_url has no matching test file (`register_form`)

get_test_report + get_failure_details

# Test Report — pytest-playwright

- total: 23
- passed: 19
- failed: 3
- flaky_in_run: 1   ← auto-retry rescued
- skipped: 0
- duration: 31.4s

## Failures
1. `tests/test_login.py::test_invalid_credentials`
   - message: `AssertionError: expected error text not visible`
   - screenshot: `test-results/.../test-failed-1.png`
   - trace:      `test-results/.../trace.zip`
   - video:      `test-results/.../video.webm`

2. `tests/test_coupon.py::test_idempotency`
   - message: `Timeout waiting for /api/coupon (5000ms)`
   - last step: `Page.waitForResponse('/api/coupon')`

claude · mk-qa-master

you ▸ Test https://your-site/login — one case per module

# Claude calls the MCP server →

→ analyze_url ✓ 4 modules · 12 endpoints · 18 candidate cases

→ generate_test ✓ tests/test_login.py (4 cases)

→ run_tests ⚠ 3 passed, 1 failed

→ get_optimization_plan ✓ Next: flaky checkout-flow, broken coupon-rule

you ▸ What's the next thing I should fix?

Add to your MCP client config

Restart your client, then talk to the AI like you always do.

{
  "mcpServers": {
    "mk-qa-master": {
      "command": "uvx",
      "args": ["mk-qa-master"],
      "env": {
        "QA_RUNNER": "pytest",
        "QA_PROJECT_ROOT": "/path/to/your/project"
      }
    }
  }
}

Read the docs → Sibling → mk-spec-master

Let's make QA simple.