Member 1 · Execution side of the loop

Let's make QA simple.

AI 測試大師 · MK QA MASTER

mk-qa-master is an MCP server that drives web (pytest / Jest / Cypress / Go), mobile (Maestro on iOS + Android, incl. BlueStacks), and API tests (anything your pytest / Jest / Cypress / Go test suite already hits) — writes the next round from a URL or a live screen, and acts as your data-driven QA advisor every single run.

Drives any of these test frameworks
pytest + Playwright Jest Cypress go test Maestro · iOS + Android
Scope

What this is not

mk-qa-master sits between your AI client and your test framework. It's not the framework, the LLM, a CI runner, a source analyzer, or a SaaS UI.

A test framework
→ Bring pytest / Jest / Cypress / Go test / Maestro — qa-master drives them
An LLM
→ Reasoning lives in your AI client (Claude / Cursor / Codex / Gemini). qa-master only exposes tools
A CI runner
→ Runs locally + reports JUnit XML / HTML. Wire JUnit into GitHub Actions / Jenkins / GitLab yourself
A source-code analyzer
→ Analyzes the live DOM (web) / view hierarchy (mobile), not your repo source
A SaaS dashboard
→ MCP-native: lives in your AI client. The HTML report is one self-contained file
What it does

Three jobs. One server. Web + mobile.

Ranked by how often you actually use them.

Run tests, web or mobile

Switch runners with a single QA_RUNNER env var: pytest / Jest / Cypress / Go for web, maestro for iOS Simulator, Android Emulator, real devices, and BlueStacks. Auto-retry, JUnit XML, screenshots, Playwright trace.zip / Maestro recordings — out of the box.

Write tests from a URL or a screen

analyze_url probes the DOM; analyze_screen dumps the live mobile hierarchy. Both surface form / cta / nav / tab-bar modules with real selectors, then generate_test emits runnable pytest or Maestro YAML — not # TODO placeholders.

Your QA optimization advisor

Every run archives a snapshot and writes a new optimization-plan.md. Flaky vs. broken vs. slow-regression — ranked by evidence, not by gut. Same loop works for web and mobile.

The pipeline

A self-correcting loop

Every run feeds the optimizer; the optimizer points at the weakest link; the next run attacks it first. Without this loop, AI is just a faster monkey tester.

Analyze
analyze_url / analyze_screen
Probe the DOM (web) or live mobile hierarchy → form / cta / nav modules + selectors.
Generate
generate_test / auto_generate_tests
Emit runnable pytest .py or Maestro .yaml against the detected modules — not # TODO.
Run
run_tests / run_failed
Drive the native runner, capture JUnit XML, screenshots, trace.zip / Maestro recordings.
Report
get_test_report / get_failure_details
Outcome strings + error signatures + history snapshot. Feeds the advisor next.
Advise
get_optimization_plan
Three lenses (suite / MCP / AI) → ranked next-run action list. The loop closes here.
↻ next run attacks the weakest link
QA knowledge in three layers

Domain context, not just DOM

A DOM-only analyzer produces 'empty field should error' — monkey testing in a new wrapper. We layer real QA knowledge on top.

Built-in
ISTQB's seven principles, equivalence partitioning, decision tables, state transitions, the test pyramid, shift-left, mobile checklists, QA metrics — baked into the server.
Your file
Drop a qa-knowledge.md in your project root: business rules, historical bugs, standard assertion copy, user journeys, technical constraints. Run init_qa_knowledge to scaffold one.
Per-test inline
Pass a business_context slice into generate_test; it gets printed as a # Business context: block inside the test, so reviewers see why without leaving the file.
The advisor

Three lenses on every run

After each run, the advisor reads history/ and telemetry, then writes a ranked action list. Three perspectives:

Suite quality

Per-test outcome strings like PFPFP feed a flake score. Cross-reference error signatures: three consecutive fails with the same signature → marked broken (a real bug, not flake).

MCP usability

Tool telemetry surfaces top tools, error rate, repeated args, and common A→B chains. Tells you where to ship a meta-tool or cache.

AI effectiveness

Did the test generate_test wrote show up in the next run? Did the modules analyze_url detected get matching test files? Adoption rate vs. coverage gap — tracked.

Runners

7 runners, one tool surface

Switch via the QA_RUNNER env var. Seven frameworks, one MCP surface — web on four, mobile on Maestro, API on Schemathesis (OpenAPI / Swagger, since v0.6.0) or Newman (Postman collections, since v0.6.1). Pre-existing API tests in pytest + httpx / Jest + supertest / Cypress cy.request() / Go httptest still ride their respective runners — no migration. Pact provider verification on the v0.7.0 roadmap.

pytest-playwright
env: QA_RUNNER=pytest
since 0.1.0
jest
env: QA_RUNNER=jest
since 0.2.0
cypress
env: QA_RUNNER=cypress
since 0.2.0
go test
env: QA_RUNNER=go
since 0.2.0
maestro (iOS + Android + BlueStacks)
env: QA_RUNNER=maestro (+ optional QA_ANDROID_HOST for BlueStacks)
since 0.3.0
schemathesis (OpenAPI / Swagger)
env: QA_RUNNER=schemathesis + QA_OPENAPI_URL (http(s):// or file://); install with mk-qa-master[api]
since 0.6.0
newman (Postman collections)
env: QA_RUNNER=newman + QA_POSTMAN_COLLECTION (path); system prereq: npm install -g newman
since 0.6.1
Tool surface

16 tools across 5 roles

Grouped by role. Each group is one layer in the analyze → generate → run → report → advise loop. README's prompting cookbook has natural-language phrasings — you rarely name a tool yourself.

Discover — orientation + scan
  • get_runner_infoWhich runner is active + all available. Call this first so the AI picks the right test template (Playwright .py vs Maestro .yaml).
  • list_testsEnumerate every collectable test under the active runner — pytest --collect-only, jest --listTests, cypress glob, go -list, maestro YAML walk.
  • analyze_urlWeb: probe a live URL — form / nav / dialog / cta modules + selectors + API endpoints the page hits + layout-overflow warnings + candidate TCs.
  • analyze_screenMobile: dump maestro hierarchy → form / cta / tab_bar modules + candidate TCs, noise-filtered (status bar + asset names stripped).
Generate — modules → runnable tests
  • generate_testTest skeleton; with module from analyze_url/analyze_screen, a *runnable* Playwright .py or Maestro .yaml with concrete selectors — not # TODO stubs.
  • auto_generate_testsOne-shot: analyze_url → generate_test per module. Hand it a URL, get a tests/ folder back.
  • codegenLaunch Playwright codegen interactively (web) / hint to maestro studio (mobile). Good for baseline happy-path recording.
  • init_qa_knowledgeScaffold qa-knowledge.md in the project root — business rules / past bugs / standard assertions / user journeys / technical constraints.
  • get_qa_contextRead qa-knowledge.md (built-in ISTQB fallback). Feed a slice into generate_test.business_context for domain-aware tests.
Run — execute the suite
  • run_testsExecute under the active runner; writes report.json + JUnit XML, snapshots into history/, auto-refreshes optimization-plan.md. Optional filter.
  • run_failedRe-run only last failures — pytest --lf, jest --onlyFailures, cypress/go reverse-lookup, maestro nodeid → .yaml. Way faster than re-running the suite.
Report — read what just happened
  • get_test_reportSummary: passed / failed / skipped / flaky_in_run / duration. Cheap — use it between actions instead of re-running.
  • get_failure_detailsPer-failure message + screenshot + Playwright trace.zip + video paths + parsed step sequence. The 「why did it fail」 tool.
  • generate_html_reportRender the latest run as one self-contained HTML — base64 screenshots, trend sparkline, collapsed Passed, expanded Failed cards. Slack-able.
  • get_test_historyLast N archived run summaries — flake / duration regression / pass-rate trend. Pair with get_optimization_plan for action items.
Advisor — the self-improvement coach
  • get_optimization_planThree-lens prioritized plan: suite quality (flake / broken / slow_regression) + MCP usability (top tools, repeat args, error rate) + AI effectiveness (generate_test adoption, coverage gaps). Writes optimization-plan.md every run.
Workflows

Four prompts cover ~90% of real use

One sentence to the AI client; the tools chain automatically.

1.

"Test https://your-site/login — analyze the page, write tests for every module, run them, then tell me what to fix."

analyze_url → generate_test (×N modules) → run_tests → get_failure_details → get_optimization_plan
2.

"I just added three new feature pages — auto-generate tests for everything the analyzer finds and run them."

auto_generate_tests(url=...) → run_tests → get_test_report → get_optimization_plan
3.

"What's wrong with my test suite this week — give me a ranked plan, not gut feel."

get_test_history(limit=30) → get_optimization_plan(history_limit=30, telemetry_limit=2000)
4.

"Test the barcode button on my mobile app on the iOS Simulator and tell me if it's flaky."

analyze_screen(app_id='com.example.app', launch_app=true) → generate_test(module=<cta>) → run_tests → get_optimization_plan
Sample output

What you actually get

Same shape as spec-master's plan — markdown, ready to paste into Slack / JIRA / a sprint planning doc. Auto-written after every run.

get_optimization_plan

# Optimization Plan — 2026-05-12T14:03:40

_Based on 6 archived runs._

## Prioritized Actions

### 1. 🔴 HIGH — flaky
- **Target**: `tests/test_login.py::test_invalid_credentials`
- **Evidence**: flake_score=0.4, outcomes=PFPFP, rerun_count=1
- **Suggestion**: 加 explicit wait (wait_for_response / locator wait)
- **auto_action_hint**: `get_failure_details(test_id="test_invalid_credentials")`

### 2. 🟡 MEDIUM — coverage_gap
- **Target**: `register_form` (module detected on /register)
- **Evidence**: analyze_url found this module; no matching test_*.py in repo
- **Suggestion**: `generate_test(description="...", filename="test_register_form.py")`

### 3. 🟡 MEDIUM — slow_regression
- **Target**: `tests/test_checkout.py::test_full_flow`
- **Evidence**: median duration 1.8× baseline across last 6 runs
- **Suggestion**: profile network waits; pin fixture data; consider parallel mark

## MCP usability
- Top tool: `run_tests` (38%) · `analyze_url` (22%) · `get_failure_details` (14%)
- Common chain: `analyze_url → generate_test` (17 occurrences)
- Error rate: 2.3% (1 timeout in analyze_url against slow staging)

## AI effectiveness
- generate_test adoption: 9 / 11 generated tests appeared in the next run (82%)
- coverage gap: 1 module from analyze_url has no matching test file (`register_form`)

get_test_report + get_failure_details

# Test Report — pytest-playwright

- total: 23
- passed: 19
- failed: 3
- flaky_in_run: 1   ← auto-retry rescued
- skipped: 0
- duration: 31.4s

## Failures
1. `tests/test_login.py::test_invalid_credentials`
   - message: `AssertionError: expected error text not visible`
   - screenshot: `test-results/.../test-failed-1.png`
   - trace:      `test-results/.../trace.zip`
   - video:      `test-results/.../video.webm`

2. `tests/test_coupon.py::test_idempotency`
   - message: `Timeout waiting for /api/coupon (5000ms)`
   - last step: `Page.waitForResponse('/api/coupon')`

Add to your MCP client config

Restart your client, then talk to the AI like you always do.

{
  "mcpServers": {
    "mk-qa-master": {
      "command": "uvx",
      "args": ["mk-qa-master"],
      "env": {
        "QA_RUNNER": "pytest",
        "QA_PROJECT_ROOT": "/path/to/your/project"
      }
    }
  }
}