nixmac E2E Report

18 passed 6 assertion failed 0 infra/not-run 24/24 selected scenarios produced reports

Evidence lanes

Hosted assertions

Deterministic hosted app-flow assertions. Screenshot cursor markers are synthetic proof overlays.

Full-Mac recording

Real macOS desktop automation with full-screen recording evidence.

Real-Mac companion

Real-Mac proof for the same product surface, not a replay of hosted assertions.

Auto-evolve basic package

auto_evolve_basic_package

tauri-wdio on GitHub Actions 1000009375 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
34s
Replay
bun run test:wdio:basic-prompts

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Submits a mocked AI prompt and verifies nixmac reaches review with a generated Nix diff.

Coverage
  • Main prompt input and send path
  • Mock provider response plumbing
  • Evolve review screen with non-empty diff
  • Generated diff includes the Darwin fonts module
Known gaps / not covered
  • Uses mocked model responses; it does not call live AI providers or apply/rebuild the generated config.
Proof full app screenshot for submits a basic prompt and reaches evolve review with diff

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.066
Detail
0.035
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 3.3s, label "step"; stable visual state.
3.3s step
Change
0.040
Contrast
0.048
Detail
0.019
  • stable visual state

Selected at 3.3s, label "step"; stable visual state.

Selected at 4.2s, label "after-click-Diff"; stable visual state.
4.2s after-click-Diff
Change
0.037
Contrast
0.046
Detail
0.022
  • stable visual state

Selected at 4.2s, label "after-click-Diff"; stable visual state.

Selected at 5.5s, label "step"; low-contrast frame; late-flow frame.
5.5s step
Change
0.000
Contrast
0.042
Detail
0.020
  • low-contrast frame
  • late-flow frame

Selected at 5.5s, label "step"; low-contrast frame; late-flow frame.

PhaseStatusDurationSummary
submits a basic prompt and reaches evolve review with diff passed 4s

Discard and restore state

discard_and_restore_state

tauri-wdio on GitHub Actions 1000009365 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
48s
Replay
bun run test:wdio:discard

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Exercises discard confirm/cancel behavior from the evolve review screen.

Coverage
  • Prompt submission reaches evolve review
  • Discard confirmation returns to the initial prompt screen
  • Discard cancellation leaves the review state intact
Known gaps / not covered
  • Does not verify every rollback/history path or rebuild behavior after discard.
Proof full app screenshot for submits a prompt, reaches evolve review, then discards and returns to initial state

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

5 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.066
Detail
0.035
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 3.9s, label "step"; stable visual state.
3.9s step
Change
0.041
Contrast
0.048
Detail
0.018
  • stable visual state

Selected at 3.9s, label "step"; stable visual state.

Selected at 5.1s, label "after-click-Diff"; stable visual state.
5.1s after-click-Diff
Change
0.037
Contrast
0.046
Detail
0.022
  • stable visual state

Selected at 5.1s, label "after-click-Diff"; stable visual state.

Selected at 7.0s, label "after-click-confirm-dialog-confirm"; low-contrast frame; late-flow frame.
7.0s after-click-confirm-dialog-confirm
Change
0.035
Contrast
0.042
Detail
0.021
  • low-contrast frame
  • late-flow frame

Selected at 7.0s, label "after-click-confirm-dialog-confirm"; low-contrast frame; late-flow frame.

Selected at 7.6s, label "final-proof"; late-flow frame.
7.6s final-proof
Change
0.057
Contrast
0.069
Detail
0.038
  • late-flow frame

Selected at 7.6s, label "final-proof"; late-flow frame.

Visual timeline

4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.068
Detail
0.039
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 2.1s, label "step"; stable visual state.
2.1s step
Change
0.041
Contrast
0.050
Detail
0.021
  • stable visual state

Selected at 2.1s, label "step"; stable visual state.

Selected at 5.1s, label "after-click-confirm-dialog-cancel"; low-contrast frame.
5.1s after-click-confirm-dialog-cancel
Change
0.035
Contrast
0.042
Detail
0.021
  • low-contrast frame

Selected at 5.1s, label "after-click-confirm-dialog-cancel"; low-contrast frame.

Selected at 7.6s, label "step"; low-contrast frame; late-flow frame.
7.6s step
Change
0.000
Contrast
0.042
Detail
0.020
  • low-contrast frame
  • late-flow frame

Selected at 7.6s, label "step"; low-contrast frame; late-flow frame.

PhaseStatusDurationSummary
submits a prompt, reaches evolve review, then discards and returns to initial state passed 7s
submits a prompt, reaches evolve review, then cancels discard and stays on review passed 7s

Manual evolve existing changes

manual_evolve_existing_changes

tauri-wdio on GitHub Actions 1000009374 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
38s
Replay
bun run test:wdio:modify

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Runs sequential prompts and verifies the second change preserves the first generated diff.

Coverage
  • Follow-up prompt from evolve review
  • Existing uncommitted change preservation
  • Generated diff contains both first and second package edits
Known gaps / not covered
  • Uses mocked model responses and does not run the final apply/rebuild step.
Proof full app screenshot for submits sequential prompts on the evolve review screen

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

3 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.066
Detail
0.035
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 3.0s, label "step"; stable visual state.
3.0s step
Change
0.041
Contrast
0.050
Detail
0.021
  • stable visual state

Selected at 3.0s, label "step"; stable visual state.

Selected at 7.6s, label "final-proof"; low-contrast frame; late-flow frame.
7.6s final-proof
Change
0.000
Contrast
0.043
Detail
0.020
  • low-contrast frame
  • late-flow frame

Selected at 7.6s, label "final-proof"; low-contrast frame; late-flow frame.

PhaseStatusDurationSummary
submits sequential prompts on the evolve review screen passed 7s

Question answer follow-up

question_answer_followup

tauri-wdio on GitHub Actions 1000009372 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
24s
Replay
bun run test:wdio:question-answer

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Submits a prompt that asks the user a question, answers inline, and verifies the evolve flow continues to a generated Nix diff.

Coverage
  • Agent ask_user tool renders an inline question prompt
  • User answer is accepted from the UI and sent to the backend
  • The follow-up model response continues the same evolve flow
  • Generated diff includes the requested Darwin fonts module change
Known gaps / not covered
  • Uses mocked model responses; it does not exercise arbitrary question wording, choices, or live provider timing.
Proof full app screenshot for answers an inline agent question and continues the evolve flow

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
0.0s start
Change
0.000
Contrast
0.000
Detail
0.000
  • mostly blank or single-color frame
  • low-detail frame

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.

Selected at 1.2s, label "step"; stable visual state.
1.2s step
Change
0.094
Contrast
0.066
Detail
0.033
  • stable visual state

Selected at 1.2s, label "step"; stable visual state.

Selected at 4.1s, label "before-click-Diff"; stable visual state.
4.1s before-click-Diff
Change
0.046
Contrast
0.049
Detail
0.021
  • stable visual state

Selected at 4.1s, label "before-click-Diff"; stable visual state.

Selected at 4.9s, label "final-proof"; low-contrast frame; late-flow frame.
4.9s final-proof
Change
0.000
Contrast
0.042
Detail
0.020
  • low-contrast frame
  • late-flow frame

Selected at 4.9s, label "final-proof"; low-contrast frame; late-flow frame.

PhaseStatusDurationSummary
answers an inline agent question and continues the evolve flow passed 4s

Settings and provider tabs

settings_provider_change

tauri-wdio on GitHub Actions 1000009366 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
36s
Replay
bun run test:wdio:smoke

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Opens Settings and verifies the General, AI Models, API Keys, and Preferences tabs plus key controls render.

Coverage
  • Settings button opens the dialog
  • All four settings tabs are navigable
  • Each tab renders its expected heading
  • Critical provider, API key, diagnostics, and confirmation controls are visible
Known gaps / not covered
  • Checks visibility of key settings controls; it does not mutate every field, verify real API keys, or call live providers.
Proof full app screenshot for opens and has at least one window

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Product surface real-Mac smoke

product_surface_full_mac_smoke

failed

Failure: Settings tab did not render expected text: AI Models

What happened
Settings tab did not render expected text: AI Models
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777489004-1777489004.png
Settings state real-Mac journey

settings_state_full_mac_journey

failed

Failure: Preferences controls did not render

What happened
Preferences controls did not render
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777487217-1777487217.png

Visual timeline

2 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.049
Detail
0.024
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 1.0s, label "final-proof"; late-flow frame.
1.0s final-proof
Change
0.000
Contrast
0.049
Detail
0.024
  • late-flow frame

Selected at 1.0s, label "final-proof"; late-flow frame.

Visual timeline

5 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.049
Detail
0.024
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 0.8s, label "after-click-Settings"; stable visual state.
0.8s after-click-Settings
Change
0.069
Contrast
0.081
Detail
0.036
  • stable visual state

Selected at 0.8s, label "after-click-Settings"; stable visual state.

Selected at 2.4s, label "after-click-AI Models"; stable visual state.
2.4s after-click-AI Models
Change
0.060
Contrast
0.099
Detail
0.031
  • stable visual state

Selected at 2.4s, label "after-click-AI Models"; stable visual state.

Selected at 4.4s, label "after-click-Preferences"; stable visual state.
4.4s after-click-Preferences
Change
0.041
Contrast
0.063
Detail
0.020
  • stable visual state

Selected at 4.4s, label "after-click-Preferences"; stable visual state.

Selected at 4.7s, label "final-proof"; late-flow frame.
4.7s final-proof
Change
0.000
Contrast
0.063
Detail
0.020
  • late-flow frame

Selected at 4.7s, label "final-proof"; late-flow frame.

PhaseStatusDurationSummary
opens and has at least one window passed 0s
opens and navigates all tabs passed 5s

Settings controls persistence

settings_controls_persistence

tauri-wdio on GitHub Actions 1000009376 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
38s
Replay
bun run test:wdio:settings-controls

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Mutates representative settings controls and verifies the persisted settings file reflects the UI changes.

Coverage
  • Preferences confirmation switches can be toggled
  • API URL and vLLM key fields accept changes
  • vLLM key visibility toggle works
  • AI model iteration/build-attempt limits persist to settings.json
Known gaps / not covered
  • Uses local persisted settings; it does not validate live provider credentials or call external AI services.
Proof full app screenshot for mutates representative settings controls and verifies settings.json

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Settings state real-Mac journey

settings_state_full_mac_journey

failed

Failure: Preferences controls did not render

What happened
Preferences controls did not render
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777487217-1777487217.png

Visual timeline

6 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.181
Detail
0.040
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 1.3s, label "step"; stable visual state.
1.3s step
Change
0.134
Contrast
0.066
Detail
0.035
  • stable visual state

Selected at 1.3s, label "step"; stable visual state.

Selected at 2.0s, label "after-click-Settings"; stable visual state.
2.0s after-click-Settings
Change
0.085
Contrast
0.080
Detail
0.037
  • stable visual state

Selected at 2.0s, label "after-click-Settings"; stable visual state.

Selected at 2.6s, label "after-click-Preferences"; stable visual state.
2.6s after-click-Preferences
Change
0.044
Contrast
0.063
Detail
0.020
  • stable visual state

Selected at 2.6s, label "after-click-Preferences"; stable visual state.

Selected at 5.1s, label "after-click-API Keys"; stable visual state.
5.1s after-click-API Keys
Change
0.043
Contrast
0.103
Detail
0.035
  • stable visual state

Selected at 5.1s, label "after-click-API Keys"; stable visual state.

Selected at 17.7s, label "final-proof"; late-flow frame.
17.7s final-proof
Change
0.000
Contrast
0.097
Detail
0.027
  • late-flow frame

Selected at 17.7s, label "final-proof"; late-flow frame.

PhaseStatusDurationSummary
mutates representative settings controls and verifies settings.json passed 16s

Provider validation blocks prompt

provider_validation_blocks_prompt

tauri-wdio on GitHub Actions 1000009373 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
31s
Replay
bun run test:wdio:provider-validation

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Starts with vLLM selected and no base URL, then verifies the prompt stays blocked with settings guidance instead of submitting into an avoidable provider failure.

Coverage
  • Invalid AI provider setup is detected at the prompt
  • Prompt suggestion can still fill the input
  • Send remains disabled while required provider configuration is missing
  • The user sees an AI Models settings recovery action
Known gaps / not covered
  • Uses local vLLM validation and does not call an external provider.
Capture limitations
  • Webview frame timeline was suppressed because captured frames were not visually informative
Proof full app screenshot for keeps send disabled when vLLM is selected without a base URL

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

PhaseStatusDurationSummary
keeps send disabled when vLLM is selected without a base URL passed 1s

Provider failure recovery

provider_failure_recovery

tauri-wdio on GitHub Actions 1000009371 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
23s
Replay
bun run test:wdio:provider-failure

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Submits against a mock provider billing failure and verifies the provider failure is visible in the app instead of being swallowed.

Coverage
  • Prompt submission starts from a configured app state
  • Mock provider failure reaches the evolve error path
  • A visible, user-actionable error is rendered in the widget
Known gaps / not covered
  • Uses a deterministic mock provider failure; it does not exercise every provider's live failure shape.
Proof full app screenshot for surfaces a visible app error when the AI provider fails

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

3 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
0.0s start
Change
0.000
Contrast
0.000
Detail
0.000
  • mostly blank or single-color frame
  • low-detail frame

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.

Selected at 2.8s, label "before-type-Prompt text"; stable visual state.
2.8s before-type-Prompt text
Change
0.098
Contrast
0.070
Detail
0.037
  • stable visual state

Selected at 2.8s, label "before-type-Prompt text"; stable visual state.

Selected at 5.2s, label "final-proof"; late-flow frame.
5.2s final-proof
Change
0.000
Contrast
0.064
Detail
0.033
  • late-flow frame

Selected at 5.2s, label "final-proof"; late-flow frame.

PhaseStatusDurationSummary
surfaces a visible app error when the AI provider fails passed 3s

Live OpenRouter evolve smoke

live_openrouter_evolve_smoke

tauri-wdio on GitHub Actions 1000009368 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
30s
Replay
bun run test:wdio:live-openrouter

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Uses a real OpenRouter key to submit one evolve prompt and verifies nixmac reaches review with a generated Nix diff.

Coverage
  • Real OpenRouter credential is loaded into app settings
  • OpenAI/OpenRouter provider path submits an evolve prompt
  • Live model/tool-call loop produces a non-empty Nix diff
  • The flow stops at review without applying or rebuilding the machine
Known gaps / not covered
  • Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
  • Stops at evolve review; it does not apply, rebuild, or verify the final macOS system state.
Proof full app screenshot for uses a real OpenRouter key to reach evolve review with a generated diff

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Live OpenRouter real-Mac journey

live_openrouter_full_mac_journey

failed

Failure: Live OpenRouter diff did not contain pkgs.jq

What happened
Live OpenRouter diff did not contain pkgs.jq
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777488543-1777488543.png

Visual timeline

5 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.080
Detail
0.019
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 1.0s, label "step"; stable visual state.
1.0s step
Change
0.070
Contrast
0.066
Detail
0.033
  • stable visual state

Selected at 1.0s, label "step"; stable visual state.

Selected at 2.1s, label "after-click-Send prompt"; stable visual state.
2.1s after-click-Send prompt
Change
0.041
Contrast
0.055
Detail
0.028
  • stable visual state

Selected at 2.1s, label "after-click-Send prompt"; stable visual state.

Selected at 10.2s, label "step"; late-flow frame.
10.2s step
Change
0.049
Contrast
0.048
Detail
0.021
  • late-flow frame

Selected at 10.2s, label "step"; late-flow frame.

Selected at 11.2s, label "final-proof"; low-contrast frame; late-flow frame.
11.2s final-proof
Change
0.000
Contrast
0.042
Detail
0.018
  • low-contrast frame
  • late-flow frame

Selected at 11.2s, label "final-proof"; low-contrast frame; late-flow frame.

PhaseStatusDurationSummary
uses a real OpenRouter key to reach evolve review with a generated diff passed 10s

Prompt keyboard and suggestions

prompt_keyboard_and_suggestions

tauri-wdio on GitHub Actions 1000009378 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
26s
Replay
bun run test:wdio:prompt-keyboard

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Exercises visible prompt suggestions, send-button state, keyboard action proof, prompt history, and evolve review.

Coverage
  • Prompt send is disabled before text exists
  • Static prompt suggestion fills the prompt input
  • Proof recording includes a keyboard action annotation before submit
  • Prompt history records the submitted prompt
  • Mocked evolve response reaches review with a generated diff
Known gaps / not covered
  • Keeps model responses mocked and uses the existing reliable submit path after proving keyboard navigation in the prompt surface.
Proof full app screenshot for uses a prompt suggestion, records keyboard action proof, and reaches evolve review

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.080
Detail
0.019
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 1.0s, label "step"; stable visual state.
1.0s step
Change
0.070
Contrast
0.066
Detail
0.033
  • stable visual state

Selected at 1.0s, label "step"; stable visual state.

Selected at 4.0s, label "step"; stable visual state.
4.0s step
Change
0.054
Contrast
0.048
Detail
0.018
  • stable visual state

Selected at 4.0s, label "step"; stable visual state.

Selected at 6.8s, label "final-proof"; low-contrast frame; late-flow frame.
6.8s final-proof
Change
0.006
Contrast
0.043
Detail
0.021
  • low-contrast frame
  • late-flow frame

Selected at 6.8s, label "final-proof"; low-contrast frame; late-flow frame.

PhaseStatusDurationSummary
uses a prompt suggestion, records keyboard action proof, and reaches evolve review passed 6s

Feedback and issue reporting

feedback_report_issue

tauri-wdio on GitHub Actions 1000009367 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
46s
Replay
bun run test:wdio:feedback-report

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Exercises header feedback mode and footer issue-report mode without sending external feedback.

Coverage
  • Header feedback opens the general feedback dialog
  • Suggestion, Bug, and General type choices render in feedback mode
  • Bug feedback reveals expected-behavior fields and share options
  • Footer Report Issue opens issue mode with report-specific copy
  • Cancel controls render in both feedback and issue-report modes
Known gaps / not covered
  • Does not submit feedback to the backend or external services; it validates the user-facing collection flows.
  • Uses DOM-click fallback for close/cancel controls because native Tauri WebDriver clicks are flaky on these dialog buttons.
Proof full app screenshot for covers header feedback mode and footer issue-report mode

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Product surface real-Mac smoke

product_surface_full_mac_smoke

failed

Failure: Settings tab did not render expected text: AI Models

What happened
Settings tab did not render expected text: AI Models
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777489004-1777489004.png

Visual timeline

8 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; stable visual state.
0.0s start
Change
0.000
Contrast
0.066
Detail
0.035
  • stable visual state

Selected at 0.0s, label "start"; stable visual state.

Selected at 1.6s, label "after-click-Give feedback"; stable visual state.
1.6s after-click-Give feedback
Change
0.057
Contrast
0.055
Detail
0.032
  • stable visual state

Selected at 1.6s, label "after-click-Give feedback"; stable visual state.

Selected at 11.7s, label "after-click-Cancel feedback"; stable visual state.
11.7s after-click-Cancel feedback
Change
0.049
Contrast
0.059
Detail
0.023
  • stable visual state

Selected at 11.7s, label "after-click-Cancel feedback"; stable visual state.

Selected at 12.2s, label "after-click-Reopen feedback"; stable visual state.
12.2s after-click-Reopen feedback
Change
0.049
Contrast
0.057
Detail
0.035
  • stable visual state

Selected at 12.2s, label "after-click-Reopen feedback"; stable visual state.

Selected at 13.8s, label "after-click-Cancel clean feedback"; late-flow frame.
13.8s after-click-Cancel clean feedback
Change
0.047
Contrast
0.060
Detail
0.023
  • late-flow frame

Selected at 13.8s, label "after-click-Cancel clean feedback"; late-flow frame.

Selected at 14.4s, label "after-click-Report Issue"; late-flow frame.
14.4s after-click-Report Issue
Change
0.058
Contrast
0.068
Detail
0.033
  • late-flow frame

Selected at 14.4s, label "after-click-Report Issue"; late-flow frame.

Selected at 15.2s, label "after-click-Cancel issue report"; late-flow frame.
15.2s after-click-Cancel issue report
Change
0.058
Contrast
0.059
Detail
0.022
  • late-flow frame

Selected at 15.2s, label "after-click-Cancel issue report"; late-flow frame.

Selected at 15.4s, label "final-proof"; late-flow frame.
15.4s final-proof
Change
0.000
Contrast
0.059
Detail
0.022
  • late-flow frame

Selected at 15.4s, label "final-proof"; late-flow frame.

PhaseStatusDurationSummary
covers header feedback mode and footer issue-report mode passed 14s

History and settings navigation

history_navigation

tauri-wdio on GitHub Actions 1000009377 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
24s
Replay
bun run test:wdio:history-navigation

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Opens the history and settings surfaces and verifies the visible navigation controls return to the main prompt.

Coverage
  • History toolbar button opens the History view
  • History header renders with a count badge
  • History toolbar button closes History back to the main prompt
  • Settings opens from the header and closes without disturbing the main prompt
Known gaps / not covered
  • Navigation coverage only; it does not restore historical commits or exercise every history item state.
  • Uses DOM-click fallback for close/toggle controls because native Tauri WebDriver clicks are flaky on these small header buttons.
Proof full app screenshot for opens and closes history and settings controls

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Product surface real-Mac smoke

product_surface_full_mac_smoke

failed

Failure: Settings tab did not render expected text: AI Models

What happened
Settings tab did not render expected text: AI Models
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777489004-1777489004.png

Visual timeline

8 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
0.0s start
Change
0.000
Contrast
0.000
Detail
0.000
  • mostly blank or single-color frame
  • low-detail frame

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.

Selected at 1.1s, label "step"; stable visual state.
1.1s step
Change
0.060
Contrast
0.181
Detail
0.040
  • stable visual state

Selected at 1.1s, label "step"; stable visual state.

Selected at 1.8s, label "step"; stable visual state.
1.8s step
Change
0.131
Contrast
0.070
Detail
0.035
  • stable visual state

Selected at 1.8s, label "step"; stable visual state.

Selected at 2.5s, label "after-click-Open history"; mostly blank or single-color frame; low-detail frame.
2.5s after-click-Open history
Change
0.053
Contrast
0.024
Detail
0.007
  • mostly blank or single-color frame
  • low-detail frame

Selected at 2.5s, label "after-click-Open history"; mostly blank or single-color frame; low-detail frame.

Selected at 4.2s, label "after-click-Close history"; stable visual state.
4.2s after-click-Close history
Change
0.048
Contrast
0.069
Detail
0.036
  • stable visual state

Selected at 4.2s, label "after-click-Close history"; stable visual state.

Selected at 4.7s, label "after-click-Settings"; stable visual state.
4.7s after-click-Settings
Change
0.085
Contrast
0.080
Detail
0.036
  • stable visual state

Selected at 4.7s, label "after-click-Settings"; stable visual state.

Selected at 5.4s, label "after-click-Close settings"; late-flow frame.
5.4s after-click-Close settings
Change
0.085
Contrast
0.069
Detail
0.038
  • late-flow frame

Selected at 5.4s, label "after-click-Close settings"; late-flow frame.

Selected at 5.5s, label "final-proof"; late-flow frame.
5.5s final-proof
Change
0.000
Contrast
0.069
Detail
0.038
  • late-flow frame

Selected at 5.5s, label "final-proof"; late-flow frame.

PhaseStatusDurationSummary
opens and closes history and settings controls passed 5s

Onboarding existing repo

onboarding_existing_repo

tauri-wdio on GitHub Actions 1000009369 (github-hosted)

Hosted assertions - Deterministic webview proof

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
25s
Replay
bun run test:wdio:onboarding

Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.

What this checks

Connects an existing nix-darwin repo and verifies the app reaches the prompt screen.

Coverage
  • Setup screen renders
  • Configuration directory can be selected
  • Host selection is populated and accepted
  • Onboarding completes to the main prompt screen
Known gaps / not covered
  • Uses a prepared fixture repo; it does not cover every arbitrary user flake shape.
Proof full app screenshot for connects an existing nix-darwin repo and reaches the prompt screen

Adjacent real-Mac evidence

This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.

Visual timeline

6 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
0.0s start
Change
0.000
Contrast
0.000
Detail
0.000
  • mostly blank or single-color frame
  • low-detail frame

Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.

Selected at 3.3s, label "step"; stable visual state.
3.3s step
Change
0.072
Contrast
0.080
Detail
0.019
  • stable visual state

Selected at 3.3s, label "step"; stable visual state.

Selected at 4.0s, label "step"; stable visual state.
4.0s step
Change
0.072
Contrast
0.065
Detail
0.040
  • stable visual state

Selected at 4.0s, label "step"; stable visual state.

Selected at 4.4s, label "before-click-host-select"; stable visual state.
4.4s before-click-host-select
Change
0.051
Contrast
0.050
Detail
0.021
  • stable visual state

Selected at 4.4s, label "before-click-host-select"; stable visual state.

Selected at 5.8s, label "after-click-//*[@role="option" and normalize-space(.)="sjc20-cw712-718c95a7-e78c-4665-b745-9"; stable visual state.
5.8s after-click-//*[@role="option" and normalize-space(.)="sjc20-cw712-718c95a7-e78c-4665-b745-9
Change
0.059
Contrast
0.055
Detail
0.027
  • stable visual state

Selected at 5.8s, label "after-click-//*[@role="option" and normalize-space(.)="sjc20-cw712-718c95a7-e78c-4665-b745-9"; stable visual state.

Selected at 6.3s, label "final-proof"; late-flow frame.
6.3s final-proof
Change
0.003
Contrast
0.057
Detail
0.029
  • late-flow frame

Selected at 6.3s, label "final-proof"; late-flow frame.

PhaseStatusDurationSummary
connects an existing nix-darwin repo and reaches the prompt screen passed 3s

Settings state real-Mac journey

settings_state_full_mac_journey

full-mac on macos-e2e (full-mac)

Real-Mac companion - Attached to Settings and provider tabs, Settings controls persistence

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
368s
Replay
tests/e2e/run.sh settings_state_full_mac_journey

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the shipped app on a real Mac, opens Settings, records Preferences/API Keys/AI Models, and verifies representative Preferences mutations persist to the real app-support settings.json.

Coverage
  • Configured prompt screen launches from persisted macOS app-support settings
  • Preferences tab renders and representative confirmation toggles persist via UI interaction
  • API Keys tab renders OpenRouter, OpenAI, Ollama, and vLLM settings from the seeded settings.json
  • AI Models tab renders provider and limit controls from the seeded settings.json
  • Uses polling against the real settings.json to avoid persistence races
  • Publishes screenshots and a full-screen recording as adjacent real-desktop proof
Known gaps / not covered
  • Real-Mac companion proof; WDIO remains the deterministic authority for exact select/input values and full settings-form mutation coverage.
  • Only Preferences toggles are mutated through UI in this full-Mac journey; API Keys and AI Models are hydrated/recorded from seeded settings.
  • Does not call live providers or validate external API credentials.

Failure: Preferences controls did not render

What happened
Preferences controls did not render
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777487217-1777487217.png
PhaseStatusDurationSummary
Nix installed (already present) passed 0s
Configured state seeded passed 0s
Settings dialog launched passed 0s
Preferences controls did not render failed 0s Preferences controls did not render

Evolve review real-Mac journey

evolve_review_full_mac_journey

full-mac on macos-e2e (full-mac)

Real-Mac companion - Attached to Auto-evolve basic package, Discard and restore state, Manual evolve existing changes +2 more

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
184s
Replay
tests/e2e/run.sh evolve_review_full_mac_journey

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the shipped app on a real Mac against a local mock vLLM server, records prompt submission, review, follow-up, question-answer, and discard behavior, and verifies generated git diffs.

Coverage
  • Prompt suggestions and Send are usable in the released app
  • Mock vLLM response reaches evolve review with a JetBrains Mono diff
  • Follow-up prompt preserves the first diff and adds Fira Code
  • Discard cancel keeps the review state visible
  • Discard confirm returns to the prompt state
  • Inline question-answer continues to evolve review
  • Reads git diff from a temporary nix-darwin repo on the Mac runner
  • Publishes screenshots and a full-screen recording as adjacent real-desktop proof
Known gaps / not covered
  • Uses a local mock provider; live provider behavior remains covered by the separate live OpenRouter scenario.
  • Does not apply or rebuild the generated nix-darwin configuration.
  • Release-app localhost mock reachability is verified only when the full-Mac job actually runs on the Mac runner.

Full screen recording

PhaseStatusDurationSummary
Nix installed (already present) passed 0s
Provider-guarded state seeded passed 0s
Prompt/provider guardrail verified passed 0s
Secondary prompt surfaces verified passed 0s

Provider resilience real-Mac journey

provider_resilience_full_mac_journey

full-mac on macos-e2e (full-mac)

Real-Mac companion - Attached to Provider validation blocks prompt, Provider failure recovery

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
257s
Replay
tests/e2e/run.sh provider_resilience_full_mac_journey

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the shipped app on a real Mac through invalid vLLM configuration and mock provider failure states, recording visible recovery/error behavior.

Coverage
  • vLLM selected without a base URL renders provider validation before submission
  • Prompt suggestion can fill the input while provider validation remains visible
  • Open AI Models settings recovery copy is visible
  • Mock provider billing/credits failure surfaces visibly in the widget
  • Reads settings.json to prove which provider configuration each phase used
  • Publishes screenshots and a full-screen recording as adjacent real-desktop proof
Known gaps / not covered
  • Uses deterministic local provider states; it does not enumerate every live provider failure shape.
  • Does not call external providers.
02-provider-failure-visible-1777487798.png
PhaseStatusDurationSummary
Nix installed (already present) passed 0s
Provider validation block verified passed 0s
Provider failure recovery verified passed 0s

Onboarding settings-contract real-Mac journey

onboarding_settings_contract_full_mac_journey

full-mac on macos-e2e (full-mac)

Real-Mac companion - Attached to Onboarding existing repo

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
178s
Replay
tests/e2e/run.sh onboarding_settings_contract_full_mac_journey

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the shipped app on a real Mac from fresh state, records setup/onboarding, then seeds the same repo/host settings contract and verifies the app reaches the prompt screen.

Coverage
  • Fresh app-support state routes to setup/onboarding
  • Temporary nix-darwin config repo is a real git repo with a host attr
  • Persisted configDir and hostAttr route the released app to the prompt screen
  • settings.json contains the selected configDir and hostAttr
  • Publishes screenshots and a full-screen recording as adjacent real-desktop proof
Known gaps / not covered
  • Intentionally settings-contract proof only: it does not drive the native file picker or claim full onboarding picker coverage.
  • WDIO remains the deterministic authority for directory picker and host selection interaction.

Full screen recording

PhaseStatusDurationSummary
Nix installed (already present) passed 0s
Fresh onboarding screen verified passed 0s
Repo/host settings contract verified passed 0s

Live OpenRouter real-Mac journey

live_openrouter_full_mac_journey

full-mac on macos-e2e (full-mac)

Real-Mac companion - Attached to Live OpenRouter evolve smoke

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
411s
Replay
tests/e2e/run.sh live_openrouter_full_mac_journey

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the shipped app on a real Mac with a real OpenRouter key, submits a constrained prompt, and verifies a live-model diff reaches review without applying.

Coverage
  • OpenRouter key/model are seeded into the released app settings on the Mac runner
  • Prompt submission calls a live OpenRouter-backed provider path
  • Live model/tool-call loop reaches evolve review
  • Temporary config repo diff includes pkgs.jq in flake.nix
  • Flow stops at review without applying or rebuilding
  • Publishes screenshots and a full-screen recording as adjacent real-desktop proof
Known gaps / not covered
  • Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
  • Stops at evolve review; it does not apply, rebuild, or verify final macOS state.

Failure: Live OpenRouter diff did not contain pkgs.jq

What happened
Live OpenRouter diff did not contain pkgs.jq
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777488543-1777488543.png
PhaseStatusDurationSummary
Nix installed (already present) passed 0s
Live provider settings seeded passed 0s
Live OpenRouter diff did not contain pkgs.jq failed 0s Live OpenRouter diff did not contain pkgs.jq

Product surface real-Mac smoke

product_surface_full_mac_smoke

full-mac on macos-e2e (full-mac)

Real-Mac companion - Attached to Settings and provider tabs, Feedback and issue reporting, History and settings navigation

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
383s
Replay
tests/e2e/run.sh product_surface_full_mac_smoke

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the shipped app on a real Mac with Nix installed and records adjacent desktop proof for Settings, feedback, Report Issue, and History surfaces.

Coverage
  • App launches from /Applications on the Mac runner
  • Settings opens and the General, AI Models, API Keys, and Preferences surfaces render
  • Header feedback opens the feedback dialog
  • Footer Report Issue opens issue-report mode
  • History opens and returns to the main surface
  • Publishes a full-screen recording as adjacent real-desktop proof
Known gaps / not covered
  • Adjacent real-Mac smoke only; it does not replay each hosted WDIO assertion or verify settings persistence on disk.
  • Uses the nix-installed fixture so the app is past the Nix installation prerequisite, but it does not apply or rebuild a nix-darwin configuration.

Failure: Settings tab did not render expected text: AI Models

What happened
Settings tab did not render expected text: AI Models
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777489004-1777489004.png
PhaseStatusDurationSummary
Nix installed (already present) passed 0s
Product surface launched passed 0s
Settings tab did not render expected text: AI Models failed 0s Settings tab did not render expected text: AI Models

Release DMG launch smoke

release_dmg_app_translocation_smoke

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
119s
Replay
tests/e2e/run.sh release_dmg_app_translocation_smoke

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Launches the installed /Applications app on a real Mac and verifies a usable first screen.

Coverage
  • Build artifact can be installed on the Mac runner
  • App launches from /Applications
  • First screen renders enough nixmac text to rule out startup/App Translocation crashes
  • Publishes a full-screen recording as proof
Known gaps / not covered
  • Launch smoke only; it does not exercise Nix installation, settings, or evolve/apply flows.

Full screen recording

PhaseStatusDurationSummary
Clean machine ready passed 0s
App installed at /Applications/nixmac.app passed 0s
App launched passed 0s
First screen rendered passed 0s

macOS descriptor prompt smoke

macos_descriptor_prompt_smoke

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
490s
Replay
tests/e2e/run.sh macos_descriptor_prompt_smoke

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Launches the real macOS app, types one descriptor into the main prompt, and verifies the expected local provider-validation block with 30 fps video proof.

Coverage
  • Exact-SHA app artifact launches on the real Mac runner
  • Prompt input is reachable through stable accessibility metadata
  • Descriptor text can be typed and observed in the real UI
  • Submit affordance is reachable through stable accessibility metadata
  • Local provider validation blocks submit without requiring a fragile mock provider
  • Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
  • Intentionally stops at local provider validation; it does not call a live or mock AI provider yet.
  • Requires the self-hosted Mac runner to already have Nix and darwin-rebuild available.

Failure: Typed descriptor was not visible in the prompt input

What happened
Typed descriptor was not visible in the prompt input
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777489813-1777489813.png
PhaseStatusDurationSummary
Prepared config repo, mock system prerequisites, and local provider-validation settings passed 0s
App launched passed 0s
Descriptor prompt input reached passed 0s
Typed descriptor was not visible in the prompt input failed 0s Typed descriptor was not visible in the prompt input

macOS provider evolve full smoke

macos_provider_evolve_full_smoke

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
213s
Replay
tests/e2e/run.sh macos_provider_evolve_full_smoke

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Launches the installed macOS app, submits a descriptor, calls an OpenAI-compatible HTTP provider, applies the provider's Nix edit, runs the mocked host rebuild/activation step, generates the Save-step commit message through the provider, and commits the result.

Coverage
  • Exact-SHA app artifact launches on the real Mac runner
  • Prompt input is reached and submitted through accessibility metadata
  • Evolution provider receives a real HTTP chat completion request with tool schemas
  • Provider tool calls edit flake.nix and run build_check through the backend
  • Summary provider receives JSON completion requests for the generated diff
  • Build & Test advances to Save using the explicit E2E mock-system activation path
  • Commit-message provider receives a conventional-commit request and populates the Save step
  • Save step commits the provider-generated message and returns to Describe
  • Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
  • Uses a deterministic local OpenAI-compatible provider so the test is stable; it does not depend on external provider billing, latency, or model nondeterminism.
  • Mocks only the host system rebuild/activation under NIXMAC_E2E_MOCK_SYSTEM=1, so it does not mutate the self-hosted runner's real macOS configuration.

Failure: Typed descriptor was not visible in the prompt input

What happened
Typed descriptor was not visible in the prompt input
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777490099-1777490099.png
PhaseStatusDurationSummary
Prepared config repo, deterministic HTTP provider, completion logging, and mock rebuild flag passed 0s
App launched passed 0s
Typed descriptor was not visible in the prompt input failed 0s Typed descriptor was not visible in the prompt input

macOS live provider real system evolve

macos_live_provider_evolve_real_system

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
3358s
Replay
tests/e2e/run.sh macos_live_provider_evolve_real_system

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Starts from a clean Mac fixture, installs Nix through the shipped app, submits a descriptor, calls the real OpenRouter provider, applies the provider's Nix edit, runs real nix-darwin build and activation, generates the Save-step commit message through the provider, commits the result, and restores or uninstalls the test system state.

Coverage
  • Clean-machine fixture is used so the lane can bootstrap Nix instead of assuming the runner already has it
  • Install Nix flow runs through the shipped app before provider evolution begins
  • Exact-SHA app artifact launches on the real Mac runner
  • Prompt input is reached and submitted through accessibility metadata
  • Real OpenRouter evolve provider receives the descriptor and returns tool calls
  • Provider tool calls edit flake.nix and run real build_check through nix
  • Build & Test runs real darwin-rebuild build and activation with macOS admin authentication
  • System profile changes after activation, proving the mock-system path was not used
  • Summary/commit provider completions are recorded from the real provider
  • Save step commits the provider-generated message and returns to Describe
  • Previous system profile is restored after the proof run when one existed, otherwise the test Nix install is removed
  • Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
  • Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
  • Runs on the configured full-Mac runner and mutates then restores or uninstalls that runner's real nix-darwin system state; it is intentionally not parallel-safe on one Mac.
  • Uses live nix-darwin/nixpkgs inputs during the temporary fixture lock step, so upstream flakes can still affect runtime stability.

Failure: Real Build & Test did not advance to Save/commit step

What happened
Real Build & Test did not advance to Save/commit step
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777493442-1777493442.png
PhaseStatusDurationSummary
Clean machine ready passed 0s
Nix installed and detected through the shipped app passed 0s
Prepared real OpenRouter settings, real nix-darwin flake, and preserved current system profile passed 0s
App launched passed 0s
Descriptor submitted passed 0s
Live OpenRouter evolve provider edited flake.nix and reached Review passed 0s
Real Build & Test did not advance to Save/commit step failed 0s Real Build & Test did not advance to Save/commit step

Install Nix on clean machine

install_nix_clean_machine

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

passed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
231s
Replay
tests/e2e/run.sh install_nix_clean_machine

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Runs the real Install Nix flow on a clean Mac and verifies Nix works afterward.

Coverage
  • App launches into the Nix install flow
  • Install Nix button can be clicked
  • Determinate Nix package downloads and installs
  • App detects Nix and prefetches darwin-rebuild
  • Final Nix binary verification passes
  • Publishes a full-screen recording as proof
Known gaps / not covered
  • Runs on one configured Mac runner and macOS version; it does not cover every hardware or OS variant.

Full screen recording

PhaseStatusDurationSummary
Clean machine ready passed 0s
App launched passed 0s
Install button clicked passed 0s
Download complete passed 0s
Nix installed passed 0s
App detected Nix passed 0s
Prefetch complete passed 0s
All verifications passed passed 0s