CI Testing with test-spec.json

Define mobile tests as code and run them automatically in CI/CD pipelines. The test-spec.json format lets AI agents and automation tools describe tests that VibeView executes on real simulators.

Quick Start

Create .vibeview/test-spec.json in your repository root:

{
  "version": 1,
  "test_name": "Login Flow",
  "steps": [
    "Launch the app",
    "Tap Sign In",
    "Enter test credentials",
    "Verify home screen loads"
  ]
}

Copy the workflow template templates/vibeview-test.yml (provided by VibeView) to .github/workflows/vibeview-test.yml in your repo. See Workflow Template below.

Or run manually with the CLI:

# Install CLI from GitHub Packages
npm install -g @scriptx-com/vibeview-cli

vibeview run-test <test-id> --output json

Schema Reference

Field	Type	Required	Description
`version`	number	Yes	Schema version. Must be `1`.
`steps`	string[]	Yes	Array of natural language test steps for the AI agent.
`test_name`	string	No	Name for the test case. Defaults to `"CI Test"`.
`context`	string	No	Additional context for the AI agent (e.g., login credentials, app state assumptions).

Field Details

version — Must be 1. This field is mandatory from day one for forward compatibility. Future schema versions will increment this number and maintain backward compatibility.

steps — Each string is a natural language instruction for the AI agent. Write steps as you would explain them to a human tester. The AI agent interprets each step, interacts with the app UI, and reports pass/fail.

test_name — Optional label for the test case. Appears in PR comments, the VibeView dashboard, and CLI output.

context — Optional background information the AI agent uses when executing steps. Useful for providing test credentials, describing expected app state, or clarifying ambiguous UI elements.

Note: app_path and platform are not in the test spec. The app build artifact path is configured in the workflow template’s APP_PATH env var, and platform is auto-detected from the binary when uploading.

File Location

The file must be at .vibeview/test-spec.json in the repository root:

your-repo/
  .vibeview/
    test-spec.json
  src/
  build/
  ...

The workflow template looks for this path by default. If the file is missing or invalid, the workflow fails with a clear error message.

Examples

Minimal (required fields only)

{
  "version": 1,
  "steps": [
    "Launch the app",
    "Verify the home screen is visible"
  ]
}

With context and test name

{
  "version": 1,
  "test_name": "Login Flow",
  "steps": [
    "Launch the app",
    "Tap the Sign In button",
    "Enter 'testuser@example.com' in the email field",
    "Enter 'password123' in the password field",
    "Tap Submit",
    "Verify the dashboard screen loads with a welcome message"
  ],
  "context": "The app has a test account pre-configured: testuser@example.com / password123. The Sign In button is on the initial landing screen."
}

Onboarding example

{
  "version": 1,
  "test_name": "Onboarding Flow",
  "steps": [
    "Launch the app",
    "Swipe left through the onboarding screens",
    "Tap Get Started",
    "Verify the main screen appears"
  ],
  "context": "First launch after fresh install. The app shows 3 onboarding slides before the Get Started button."
}

Using with GitHub Actions

Copy the workflow template into your repo and configure secrets. The workflow handles app upload, test creation, execution, and PR reporting automatically.

Minimal workflow

# Copy templates/vibeview-test.yml to .github/workflows/vibeview-test.yml
# Then add your build step and configure secrets (see Workflow Template section)

The workflow:

Validates required secrets (VIBEVIEW_API_TOKEN, VIBEVIEW_NPM_TOKEN).
Installs @scriptx-com/vibeview-cli from GitHub Packages.
Reads .vibeview/test-spec.json and validates all required fields.
Uploads the app artifact to VibeView.
Creates a test case from the spec steps and context.
Runs the test with --commit-sha set to the PR head SHA.
Posts a detailed PR comment with pass/fail status, step results, and a link to the full run in VibeView.

Pinning a specific build

By default, CI runs install the app’s latest uploaded build. To run against an older build — for example, to reproduce a regression on the last release before landing a fix — pass --build <id> to the CLI or set build_id on the CI API payload.

The id is the numeric AppBuild id. You can find it on the app detail page under Build History, or in the JSON response of POST /api/v1/apps/{app_id}/builds.

CLI

vibeview run-test <test-id> --build 142 --commit-sha $(git rev-parse HEAD)
vibeview run-suite <suite-id> --build 142 --commit-sha $(git rev-parse HEAD)

GitHub Actions

- name: Run VibeView suite against a specific build
  env:
    VIBEVIEW_API_TOKEN: ${{ secrets.VIBEVIEW_API_TOKEN }}
  run: |
    vibeview run-suite ${{ secrets.SUITE_ID }} \
      --build ${{ vars.BUILD_ID }} \
      --commit-sha ${{ github.sha }} \
      --output junit > results.xml

If --build is omitted, the run falls back to the app’s latest build.

CI HTTP API

The CLI ultimately calls POST /api/v1/ci/run-test and POST /api/v1/ci/run-suite. Both endpoints accept an optional build_id integer in the JSON body. When omitted, the server resolves the latest build for the attached app. When the case or suite has no app attached, sending build_id returns HTTP 400.

Workflow Template

VibeView is distributed as a copy-paste workflow template rather than a uses: action reference.

Setup:

Copy the workflow template templates/vibeview-test.yml (provided by VibeView) to .github/workflows/vibeview-test.yml in your repo.
Add these secrets to your repository (Settings > Secrets and variables > Actions):
- VIBEVIEW_API_TOKEN — Your API token from VibeView Settings > API Tokens
- VIBEVIEW_NPM_TOKEN — A GitHub PAT with packages:read scope on the vibeview org
- VIBEVIEW_BASE_URL (optional) — Your VibeView instance URL if not using the default
Add your app build step in the workflow where indicated by the comment.
Create .vibeview/test-spec.json with your test definition.
Open a PR — the workflow runs automatically.

The template validates your test spec, uploads the app, runs AI tests, and posts results as a PR comment.

Using with Generic CI

For non-GitHub CI systems (GitLab CI, CircleCI, Jenkins, etc.), install the CLI and call commands directly:

# Install CLI from GitHub Packages
npm install -g @scriptx-com/vibeview-cli

# Parse the spec
STEPS=$(jq -c .steps .vibeview/test-spec.json)
CONTEXT=$(jq -r '.context // ""' .vibeview/test-spec.json)
TEST_NAME=$(jq -r '.test_name // "CI Test"' .vibeview/test-spec.json)

# Upload the app (platform is auto-detected from the binary)
APP_PATH="./build/MyApp.app"  # Set to your build output path
APP_ID=$(vibeview upload-app "$APP_PATH" --output json | jq -r .app_id)

# Create and run the test
TEST_ID=$(vibeview create-test \
  --app "$APP_ID" \
  --steps "$STEPS" \
  --context "$CONTEXT" \
  --test-name "$TEST_NAME" \
  --output json | jq -r .test_id)

vibeview run-test "$TEST_ID" \
  --commit-sha $(git rev-parse HEAD) \
  --output json

Environment variables

Set these in your CI environment:

Variable	Description
`VIBEVIEW_API_TOKEN`	Your VibeView API token (from Settings > API Tokens)
`VIBEVIEW_BASE_URL`	Your VibeView instance URL (default: `https://vibeview.io`)

Validation

The workflow template validates test-spec.json before proceeding. If any required field is missing or invalid, it fails immediately with a clear error:

Missing version — “test-spec.json: ‘version’ field is required”
Wrong version — “test-spec.json: unsupported version 2, expected 1”
Missing steps — “test-spec.json: ‘steps’ field is required”
Empty steps — “test-spec.json: ‘steps’ must contain at least one step”
App file not found — “App artifact not found at: $APP_PATH. Set the APP_PATH env var or build your app in a prior step.”

Validation runs before any API calls, so you get fast feedback without waiting for uploads or test execution.

CLI Output with Run URLs

When a test completes, the CLI output includes a URL to view the full results in VibeView:

Human format:

  PASS  Login Flow  12.3s
  4/4 cases passed
  https://vibeview.io/tests/runs/abc123

JSON format:

{
  "run_id": "abc123",
  "status": "passed",
  "url": "https://vibeview.io/tests/runs/abc123",
  "duration_ms": 12340,
  "test_runs": [...]
}

The URL is useful for PR comments, Slack notifications, or linking from other CI tools.

Reading run results

Run commands (run-test, run-suite, run-suites) are async: the server accepts the request and returns {run_id, status} immediately. The CLI then polls GET /api/v1/ci/runs/{run_id} every 3 seconds until the run completes. This section documents the poll payload for programmatic consumers who call the endpoint directly.

Response shapes

The poll response has two shapes depending on whether the run is a single test or a suite.

type: "test_run" — a single test case run. Top-level run fields plus a steps[] array (present once the run reaches a terminal status):

{
  "run_id": "tr_abc123",
  "type": "test_run",
  "status": "failed",
  "case_name": "Login Flow",
  "duration_ms": 8420,
  "run_url": "https://vibeview.io/tests/runs/tr_abc123",
  "model_used": "claude-4-sonnet",
  "progress": null,
  "steps": [
    {
      "step": "Tap the Sign In button",
      "status": "failed",
      "error": "Element not found: Sign In button",
      "error_kind": "element_not_found",
      "error_kind_description": "The element the step needed was not present or could not be located on screen.",
      "duration_ms": 3100,
      "screenshot_url": "/api/v1/tests/test-runs/tr_abc123/screenshots/step_1.jpg",
      "screenshot_signed_url": "https://vibeview.io/api/v1/public/screenshots/tr_abc123/step_1.jpg?exp=1746835200&sig=abc123",
      "iteration_screenshots": []
    }
  ]
}

While the run is still in flight, steps is absent and progress carries the live step-progress events instead; progress is null once the run is terminal.

type: "suite_run" — a suite run. Top-level run fields plus a test_runs[] array of per-case entries. total_cases is the suite’s full case count — test_runs[] only contains cases that have started, so use total_cases as the denominator for “N/M complete”. Note the run link: the top-level run_url points at the suite detail page (/tests/{suite_public_id}), where suite runs are shown as expandable rows; each per-case entry’s url points at that case run’s own page.

{
  "run_id": "sr_def456",
  "type": "suite_run",
  "status": "failed",
  "suite_public_id": "suite_abc123",
  "suite_name": "Login Suite",
  "duration_ms": 24800,
  "run_url": "https://vibeview.io/tests/suite_abc123",
  "total_cases": 4,
  "test_runs": [
    {
      "public_id": "tr_ghi789",
      "case_name": "Login Flow",
      "status": "failed",
      "duration_ms": 8200,
      "error_message": "Visual mismatch on home screen",
      "error_kind": "visual_regression",
      "error_kind_description": "The screen differed from its visual baseline beyond the allowed threshold.",
      "screenshot_signed_url": "https://vibeview.io/api/v1/public/screenshots/tr_ghi789/step_3.jpg?exp=1746835200&sig=xyz789",
      "url": "https://vibeview.io/tests/runs/tr_ghi789"
    }
  ]
}

Both shapes also carry started_at, finished_at, and error_message at the top level. While a run is waiting for a device (status: "queued"), the payload includes a queue block with entry_id, position, and specific_device_id.

screenshot_url vs screenshot_signed_url

Each failed step or suite case includes two screenshot fields:

screenshot_url — a relative API path (/api/v1/tests/test-runs/{run_id}/screenshots/{file}). Resolve it against your VibeView API host and send an Authorization: Bearer <token> header to fetch it.
screenshot_signed_url — a signed, time-limited public URL. No auth header needed. Returns image/jpeg directly (suitable for embedding in Slack messages, PR comments, or email). Expires after 7 days; returns HTTP 403 if expired or the signature is invalid.

error_kind values

The error_kind field classifies why a step or suite case failed. It is populated for failed or errored steps/cases and null for passed ones. Every payload that carries error_kind also carries error_kind_description — a human-readable sentence explaining the classification.

AI-supplied — the AI agent picks these when it calls step_failed:

Value	Description
`element_not_found`	The element the step needed was not present or could not be located on screen.
`app_unresponsive`	The screen was frozen or no UI rendered, so the step could not proceed.
`app_bug`	The app crashed or behaved incorrectly (e.g. an error dialog or clearly wrong screen).
`wrong_state`	The app was on a different screen than the step expected.
`assertion_mismatch`	A condition the agent verified was not true.
`blocked`	The step could not run because an external precondition was unmet (e.g. missing test data, login wall, paywall).
`needs_human`	The failure was genuinely ambiguous and needs a person to judge.

System-supplied — set in code when the cause is known:

Value	Description
`assertion_failed`	An explicit assertion in the test did not hold.
`max_iterations`	The agent could not complete the step within the allowed number of attempts; the step may be too complex — try breaking it into smaller steps.
`aborted`	The run was halted by a safety guard (e.g. token budget exceeded, or a replay step permanently failed).
`visual_regression`	The screen differed from its visual baseline beyond the allowed threshold.

Fallback:

Value	Description
`unknown`	The failure could not be classified (older run, or an unrecognized category).

For webhook payloads that also carry error_kind, see the error_kind table in the Webhooks & Slack guide.

Filtering suites by tag

Tag suites in the VibeView UI (suite create/edit), then have CI run a tag-filtered subset instead of a single hard-coded suite ID. This is the recommended pattern for monorepos where different teams own different suites.

Discovering tags

vibeview list-suites --tag profiles

Lists every suite tagged profiles so you can confirm what would run.

Running by tag

vibeview run-suites --tag profiles --commit-sha $GIT_SHA

Runs every matching suite sequentially, one device session per suite, and aggregates the result. Exit codes: 0 aggregate passed, 1 any suite failed, 2 no matches / timeout / error / cancelled.

To pin a tagged run to a specific device, first list registered devices with vibeview list-devices, then pass --device.

PR-driven workflow

Trigger different suites based on what changed in the PR:

- name: Run profile tests if profiles/ changed
  if: contains(github.event.pull_request.changed_files, 'profiles/')
  run: vibeview run-suites --tag profiles --commit-sha ${{ github.sha }}

Use --match all if you need a suite to have every listed tag (e.g., --tag smoke --tag ios --match all runs only the iOS smoke suites).

In a build matrix where each job builds a single platform, scope the run to that platform with --platform so a shared tag like prod only runs the suites you built an app for — suites of other platforms are skipped and reported as excluded:

# matrix.platform is one of ios / android / tvos / androidtv
- run: vibeview run-suites --tag prod --platform ${{ matrix.platform }} --commit-sha ${{ github.sha }}

This is cleaner than encoding the platform as a tag — the suite’s own platform (or its linked app’s) is used. See run-suites for details.

Version History

Version	Date	Changes
1	2026-03-30	Initial schema: version, steps, context, test_name