nixmac E2E Report
Evidence lanes
Deterministic hosted app-flow assertions. Screenshot cursor markers are synthetic proof overlays.
Real macOS desktop automation with full-screen recording evidence.
Real-Mac proof for the same product surface, not a replay of hosted assertions.
Auto-evolve basic package
auto_evolve_basic_package
tauri-wdio on GitHub Actions 1000009375 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 34s
- Replay
bun run test:wdio:basic-prompts
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Submits a mocked AI prompt and verifies nixmac reaches review with a generated Nix diff.
- Main prompt input and send path
- Mock provider response plumbing
- Evolve review screen with non-empty diff
- Generated diff includes the Darwin fonts module
- Uses mocked model responses; it does not call live AI providers or apply/rebuild the generated config.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
evolve_review_full_mac_journey
Visual timeline
4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.066
- Detail
- 0.035
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.040
- Contrast
- 0.048
- Detail
- 0.019
- stable visual state
Selected at 3.3s, label "step"; stable visual state.
after-click-Diff
- Change
- 0.037
- Contrast
- 0.046
- Detail
- 0.022
- stable visual state
Selected at 4.2s, label "after-click-Diff"; stable visual state.
step
- Change
- 0.000
- Contrast
- 0.042
- Detail
- 0.020
- low-contrast frame
- late-flow frame
Selected at 5.5s, label "step"; low-contrast frame; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| submits a basic prompt and reaches evolve review with diff | passed | 4s |
Discard and restore state
discard_and_restore_state
tauri-wdio on GitHub Actions 1000009365 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 48s
- Replay
bun run test:wdio:discard
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Exercises discard confirm/cancel behavior from the evolve review screen.
- Prompt submission reaches evolve review
- Discard confirmation returns to the initial prompt screen
- Discard cancellation leaves the review state intact
- Does not verify every rollback/history path or rebuild behavior after discard.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
evolve_review_full_mac_journey
Visual timeline
5 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.066
- Detail
- 0.035
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.041
- Contrast
- 0.048
- Detail
- 0.018
- stable visual state
Selected at 3.9s, label "step"; stable visual state.
after-click-Diff
- Change
- 0.037
- Contrast
- 0.046
- Detail
- 0.022
- stable visual state
Selected at 5.1s, label "after-click-Diff"; stable visual state.
after-click-confirm-dialog-confirm
- Change
- 0.035
- Contrast
- 0.042
- Detail
- 0.021
- low-contrast frame
- late-flow frame
Selected at 7.0s, label "after-click-confirm-dialog-confirm"; low-contrast frame; late-flow frame.
final-proof
- Change
- 0.057
- Contrast
- 0.069
- Detail
- 0.038
- late-flow frame
Selected at 7.6s, label "final-proof"; late-flow frame.
Visual timeline
4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.068
- Detail
- 0.039
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.041
- Contrast
- 0.050
- Detail
- 0.021
- stable visual state
Selected at 2.1s, label "step"; stable visual state.
after-click-confirm-dialog-cancel
- Change
- 0.035
- Contrast
- 0.042
- Detail
- 0.021
- low-contrast frame
Selected at 5.1s, label "after-click-confirm-dialog-cancel"; low-contrast frame.
step
- Change
- 0.000
- Contrast
- 0.042
- Detail
- 0.020
- low-contrast frame
- late-flow frame
Selected at 7.6s, label "step"; low-contrast frame; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| submits a prompt, reaches evolve review, then discards and returns to initial state | passed | 7s | |
| submits a prompt, reaches evolve review, then cancels discard and stays on review | passed | 7s |
Manual evolve existing changes
manual_evolve_existing_changes
tauri-wdio on GitHub Actions 1000009374 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 38s
- Replay
bun run test:wdio:modify
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Runs sequential prompts and verifies the second change preserves the first generated diff.
- Follow-up prompt from evolve review
- Existing uncommitted change preservation
- Generated diff contains both first and second package edits
- Uses mocked model responses and does not run the final apply/rebuild step.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
evolve_review_full_mac_journey
Visual timeline
3 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.066
- Detail
- 0.035
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.041
- Contrast
- 0.050
- Detail
- 0.021
- stable visual state
Selected at 3.0s, label "step"; stable visual state.
final-proof
- Change
- 0.000
- Contrast
- 0.043
- Detail
- 0.020
- low-contrast frame
- late-flow frame
Selected at 7.6s, label "final-proof"; low-contrast frame; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| submits sequential prompts on the evolve review screen | passed | 7s |
Question answer follow-up
question_answer_followup
tauri-wdio on GitHub Actions 1000009372 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 24s
- Replay
bun run test:wdio:question-answer
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Submits a prompt that asks the user a question, answers inline, and verifies the evolve flow continues to a generated Nix diff.
- Agent ask_user tool renders an inline question prompt
- User answer is accepted from the UI and sent to the backend
- The follow-up model response continues the same evolve flow
- Generated diff includes the requested Darwin fonts module change
- Uses mocked model responses; it does not exercise arbitrary question wording, choices, or live provider timing.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
evolve_review_full_mac_journey
Visual timeline
4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.000
- Detail
- 0.000
- mostly blank or single-color frame
- low-detail frame
Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
step
- Change
- 0.094
- Contrast
- 0.066
- Detail
- 0.033
- stable visual state
Selected at 1.2s, label "step"; stable visual state.
before-click-Diff
- Change
- 0.046
- Contrast
- 0.049
- Detail
- 0.021
- stable visual state
Selected at 4.1s, label "before-click-Diff"; stable visual state.
final-proof
- Change
- 0.000
- Contrast
- 0.042
- Detail
- 0.020
- low-contrast frame
- late-flow frame
Selected at 4.9s, label "final-proof"; low-contrast frame; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| answers an inline agent question and continues the evolve flow | passed | 4s |
Settings and provider tabs
settings_provider_change
tauri-wdio on GitHub Actions 1000009366 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 36s
- Replay
bun run test:wdio:smoke
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Opens Settings and verifies the General, AI Models, API Keys, and Preferences tabs plus key controls render.
- Settings button opens the dialog
- All four settings tabs are navigable
- Each tab renders its expected heading
- Critical provider, API key, diagnostics, and confirmation controls are visible
- Checks visibility of key settings controls; it does not mutate every field, verify real API keys, or call live providers.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
product_surface_full_mac_smoke
settings_state_full_mac_journey
Visual timeline
2 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
Visual timeline
5 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.049
- Detail
- 0.024
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
after-click-Settings
- Change
- 0.069
- Contrast
- 0.081
- Detail
- 0.036
- stable visual state
Selected at 0.8s, label "after-click-Settings"; stable visual state.
after-click-AI Models
- Change
- 0.060
- Contrast
- 0.099
- Detail
- 0.031
- stable visual state
Selected at 2.4s, label "after-click-AI Models"; stable visual state.
after-click-Preferences
- Change
- 0.041
- Contrast
- 0.063
- Detail
- 0.020
- stable visual state
Selected at 4.4s, label "after-click-Preferences"; stable visual state.
final-proof
- Change
- 0.000
- Contrast
- 0.063
- Detail
- 0.020
- late-flow frame
Selected at 4.7s, label "final-proof"; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| opens and has at least one window | passed | 0s | |
| opens and navigates all tabs | passed | 5s |
Settings controls persistence
settings_controls_persistence
tauri-wdio on GitHub Actions 1000009376 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 38s
- Replay
bun run test:wdio:settings-controls
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Mutates representative settings controls and verifies the persisted settings file reflects the UI changes.
- Preferences confirmation switches can be toggled
- API URL and vLLM key fields accept changes
- vLLM key visibility toggle works
- AI model iteration/build-attempt limits persist to settings.json
- Uses local persisted settings; it does not validate live provider credentials or call external AI services.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
settings_state_full_mac_journey
Visual timeline
6 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.181
- Detail
- 0.040
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.134
- Contrast
- 0.066
- Detail
- 0.035
- stable visual state
Selected at 1.3s, label "step"; stable visual state.
after-click-Settings
- Change
- 0.085
- Contrast
- 0.080
- Detail
- 0.037
- stable visual state
Selected at 2.0s, label "after-click-Settings"; stable visual state.
after-click-Preferences
- Change
- 0.044
- Contrast
- 0.063
- Detail
- 0.020
- stable visual state
Selected at 2.6s, label "after-click-Preferences"; stable visual state.
after-click-API Keys
- Change
- 0.043
- Contrast
- 0.103
- Detail
- 0.035
- stable visual state
Selected at 5.1s, label "after-click-API Keys"; stable visual state.
final-proof
- Change
- 0.000
- Contrast
- 0.097
- Detail
- 0.027
- late-flow frame
Selected at 17.7s, label "final-proof"; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| mutates representative settings controls and verifies settings.json | passed | 16s |
Provider validation blocks prompt
provider_validation_blocks_prompt
tauri-wdio on GitHub Actions 1000009373 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 31s
- Replay
bun run test:wdio:provider-validation
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Starts with vLLM selected and no base URL, then verifies the prompt stays blocked with settings guidance instead of submitting into an avoidable provider failure.
- Invalid AI provider setup is detected at the prompt
- Prompt suggestion can still fill the input
- Send remains disabled while required provider configuration is missing
- The user sees an AI Models settings recovery action
- Uses local vLLM validation and does not call an external provider.
- Webview frame timeline was suppressed because captured frames were not visually informative
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| keeps send disabled when vLLM is selected without a base URL | passed | 1s |
Provider failure recovery
provider_failure_recovery
tauri-wdio on GitHub Actions 1000009371 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 23s
- Replay
bun run test:wdio:provider-failure
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Submits against a mock provider billing failure and verifies the provider failure is visible in the app instead of being swallowed.
- Prompt submission starts from a configured app state
- Mock provider failure reaches the evolve error path
- A visible, user-actionable error is rendered in the widget
- Uses a deterministic mock provider failure; it does not exercise every provider's live failure shape.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
Visual timeline
3 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.000
- Detail
- 0.000
- mostly blank or single-color frame
- low-detail frame
Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
before-type-Prompt text
- Change
- 0.098
- Contrast
- 0.070
- Detail
- 0.037
- stable visual state
Selected at 2.8s, label "before-type-Prompt text"; stable visual state.
final-proof
- Change
- 0.000
- Contrast
- 0.064
- Detail
- 0.033
- late-flow frame
Selected at 5.2s, label "final-proof"; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| surfaces a visible app error when the AI provider fails | passed | 3s |
Live OpenRouter evolve smoke
live_openrouter_evolve_smoke
tauri-wdio on GitHub Actions 1000009368 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 30s
- Replay
bun run test:wdio:live-openrouter
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Uses a real OpenRouter key to submit one evolve prompt and verifies nixmac reaches review with a generated Nix diff.
- Real OpenRouter credential is loaded into app settings
- OpenAI/OpenRouter provider path submits an evolve prompt
- Live model/tool-call loop produces a non-empty Nix diff
- The flow stops at review without applying or rebuilding the machine
- Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
- Stops at evolve review; it does not apply, rebuild, or verify the final macOS system state.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
live_openrouter_full_mac_journey
Visual timeline
5 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.080
- Detail
- 0.019
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.070
- Contrast
- 0.066
- Detail
- 0.033
- stable visual state
Selected at 1.0s, label "step"; stable visual state.
after-click-Send prompt
- Change
- 0.041
- Contrast
- 0.055
- Detail
- 0.028
- stable visual state
Selected at 2.1s, label "after-click-Send prompt"; stable visual state.
step
- Change
- 0.049
- Contrast
- 0.048
- Detail
- 0.021
- late-flow frame
Selected at 10.2s, label "step"; late-flow frame.
final-proof
- Change
- 0.000
- Contrast
- 0.042
- Detail
- 0.018
- low-contrast frame
- late-flow frame
Selected at 11.2s, label "final-proof"; low-contrast frame; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| uses a real OpenRouter key to reach evolve review with a generated diff | passed | 10s |
Prompt keyboard and suggestions
prompt_keyboard_and_suggestions
tauri-wdio on GitHub Actions 1000009378 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 26s
- Replay
bun run test:wdio:prompt-keyboard
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Exercises visible prompt suggestions, send-button state, keyboard action proof, prompt history, and evolve review.
- Prompt send is disabled before text exists
- Static prompt suggestion fills the prompt input
- Proof recording includes a keyboard action annotation before submit
- Prompt history records the submitted prompt
- Mocked evolve response reaches review with a generated diff
- Keeps model responses mocked and uses the existing reliable submit path after proving keyboard navigation in the prompt surface.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
evolve_review_full_mac_journey
Visual timeline
4 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.080
- Detail
- 0.019
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
step
- Change
- 0.070
- Contrast
- 0.066
- Detail
- 0.033
- stable visual state
Selected at 1.0s, label "step"; stable visual state.
step
- Change
- 0.054
- Contrast
- 0.048
- Detail
- 0.018
- stable visual state
Selected at 4.0s, label "step"; stable visual state.
final-proof
- Change
- 0.006
- Contrast
- 0.043
- Detail
- 0.021
- low-contrast frame
- late-flow frame
Selected at 6.8s, label "final-proof"; low-contrast frame; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| uses a prompt suggestion, records keyboard action proof, and reaches evolve review | passed | 6s |
Feedback and issue reporting
feedback_report_issue
tauri-wdio on GitHub Actions 1000009367 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 46s
- Replay
bun run test:wdio:feedback-report
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Exercises header feedback mode and footer issue-report mode without sending external feedback.
- Header feedback opens the general feedback dialog
- Suggestion, Bug, and General type choices render in feedback mode
- Bug feedback reveals expected-behavior fields and share options
- Footer Report Issue opens issue mode with report-specific copy
- Cancel controls render in both feedback and issue-report modes
- Does not submit feedback to the backend or external services; it validates the user-facing collection flows.
- Uses DOM-click fallback for close/cancel controls because native Tauri WebDriver clicks are flaky on these dialog buttons.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
product_surface_full_mac_smoke
Visual timeline
8 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.066
- Detail
- 0.035
- stable visual state
Selected at 0.0s, label "start"; stable visual state.
after-click-Give feedback
- Change
- 0.057
- Contrast
- 0.055
- Detail
- 0.032
- stable visual state
Selected at 1.6s, label "after-click-Give feedback"; stable visual state.
after-click-Cancel feedback
- Change
- 0.049
- Contrast
- 0.059
- Detail
- 0.023
- stable visual state
Selected at 11.7s, label "after-click-Cancel feedback"; stable visual state.
after-click-Reopen feedback
- Change
- 0.049
- Contrast
- 0.057
- Detail
- 0.035
- stable visual state
Selected at 12.2s, label "after-click-Reopen feedback"; stable visual state.
after-click-Cancel clean feedback
- Change
- 0.047
- Contrast
- 0.060
- Detail
- 0.023
- late-flow frame
Selected at 13.8s, label "after-click-Cancel clean feedback"; late-flow frame.
after-click-Report Issue
- Change
- 0.058
- Contrast
- 0.068
- Detail
- 0.033
- late-flow frame
Selected at 14.4s, label "after-click-Report Issue"; late-flow frame.
after-click-Cancel issue report
- Change
- 0.058
- Contrast
- 0.059
- Detail
- 0.022
- late-flow frame
Selected at 15.2s, label "after-click-Cancel issue report"; late-flow frame.
final-proof
- Change
- 0.000
- Contrast
- 0.059
- Detail
- 0.022
- late-flow frame
Selected at 15.4s, label "final-proof"; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| covers header feedback mode and footer issue-report mode | passed | 14s |
Onboarding existing repo
onboarding_existing_repo
tauri-wdio on GitHub Actions 1000009369 (github-hosted)
Hosted assertions - Deterministic webview proof
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 25s
- Replay
bun run test:wdio:onboarding
Hosted webview lane: deterministic app-flow assertions with screenshot and frame-timeline proof. Action-cursor markers in screenshots are synthetic WDIO proof overlays, not real cursor captures. This lane does not claim to be a full desktop screen recording.
What this checks
Connects an existing nix-darwin repo and verifies the app reaches the prompt screen.
- Setup screen renders
- Configuration directory can be selected
- Host selection is populated and accepted
- Onboarding completes to the main prompt screen
- Uses a prepared fixture repo; it does not cover every arbitrary user flake shape.
Adjacent real-Mac evidence
This recording exercises the same product surface on a real Mac. It does not re-run this scenario's deterministic hosted assertions.
onboarding_settings_contract_full_mac_journey
Visual timeline
6 meaningful frames from wdio source frames. Deterministic checks only; scripted assertions remain the gate.
start
- Change
- 0.000
- Contrast
- 0.000
- Detail
- 0.000
- mostly blank or single-color frame
- low-detail frame
Selected at 0.0s, label "start"; mostly blank or single-color frame; low-detail frame.
step
- Change
- 0.072
- Contrast
- 0.080
- Detail
- 0.019
- stable visual state
Selected at 3.3s, label "step"; stable visual state.
step
- Change
- 0.072
- Contrast
- 0.065
- Detail
- 0.040
- stable visual state
Selected at 4.0s, label "step"; stable visual state.
before-click-host-select
- Change
- 0.051
- Contrast
- 0.050
- Detail
- 0.021
- stable visual state
Selected at 4.4s, label "before-click-host-select"; stable visual state.
after-click-//*[@role="option" and normalize-space(.)="sjc20-cw712-718c95a7-e78c-4665-b745-9
- Change
- 0.059
- Contrast
- 0.055
- Detail
- 0.027
- stable visual state
Selected at 5.8s, label "after-click-//*[@role="option" and normalize-space(.)="sjc20-cw712-718c95a7-e78c-4665-b745-9"; stable visual state.
final-proof
- Change
- 0.003
- Contrast
- 0.057
- Detail
- 0.029
- late-flow frame
Selected at 6.3s, label "final-proof"; late-flow frame.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| connects an existing nix-darwin repo and reaches the prompt screen | passed | 3s |
Settings state real-Mac journey
settings_state_full_mac_journey
full-mac on macos-e2e (full-mac)
Real-Mac companion - Attached to Settings and provider tabs, Settings controls persistence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 368s
- Replay
tests/e2e/run.sh settings_state_full_mac_journey
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the shipped app on a real Mac, opens Settings, records Preferences/API Keys/AI Models, and verifies representative Preferences mutations persist to the real app-support settings.json.
- Configured prompt screen launches from persisted macOS app-support settings
- Preferences tab renders and representative confirmation toggles persist via UI interaction
- API Keys tab renders OpenRouter, OpenAI, Ollama, and vLLM settings from the seeded settings.json
- AI Models tab renders provider and limit controls from the seeded settings.json
- Uses polling against the real settings.json to avoid persistence races
- Publishes screenshots and a full-screen recording as adjacent real-desktop proof
- Real-Mac companion proof; WDIO remains the deterministic authority for exact select/input values and full settings-form mutation coverage.
- Only Preferences toggles are mutated through UI in this full-Mac journey; API Keys and AI Models are hydrated/recorded from seeded settings.
- Does not call live providers or validate external API credentials.
Failure: Preferences controls did not render
- What happened
- Preferences controls did not render
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Nix installed (already present) | passed | 0s | |
| Configured state seeded | passed | 0s | |
| Settings dialog launched | passed | 0s | |
| Preferences controls did not render | failed | 0s | Preferences controls did not render |
Evolve review real-Mac journey
evolve_review_full_mac_journey
full-mac on macos-e2e (full-mac)
Real-Mac companion - Attached to Auto-evolve basic package, Discard and restore state, Manual evolve existing changes +2 more
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 184s
- Replay
tests/e2e/run.sh evolve_review_full_mac_journey
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the shipped app on a real Mac against a local mock vLLM server, records prompt submission, review, follow-up, question-answer, and discard behavior, and verifies generated git diffs.
- Prompt suggestions and Send are usable in the released app
- Mock vLLM response reaches evolve review with a JetBrains Mono diff
- Follow-up prompt preserves the first diff and adds Fira Code
- Discard cancel keeps the review state visible
- Discard confirm returns to the prompt state
- Inline question-answer continues to evolve review
- Reads git diff from a temporary nix-darwin repo on the Mac runner
- Publishes screenshots and a full-screen recording as adjacent real-desktop proof
- Uses a local mock provider; live provider behavior remains covered by the separate live OpenRouter scenario.
- Does not apply or rebuild the generated nix-darwin configuration.
- Release-app localhost mock reachability is verified only when the full-Mac job actually runs on the Mac runner.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Nix installed (already present) | passed | 0s | |
| Provider-guarded state seeded | passed | 0s | |
| Prompt/provider guardrail verified | passed | 0s | |
| Secondary prompt surfaces verified | passed | 0s |
Provider resilience real-Mac journey
provider_resilience_full_mac_journey
full-mac on macos-e2e (full-mac)
Real-Mac companion - Attached to Provider validation blocks prompt, Provider failure recovery
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 257s
- Replay
tests/e2e/run.sh provider_resilience_full_mac_journey
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the shipped app on a real Mac through invalid vLLM configuration and mock provider failure states, recording visible recovery/error behavior.
- vLLM selected without a base URL renders provider validation before submission
- Prompt suggestion can fill the input while provider validation remains visible
- Open AI Models settings recovery copy is visible
- Mock provider billing/credits failure surfaces visibly in the widget
- Reads settings.json to prove which provider configuration each phase used
- Publishes screenshots and a full-screen recording as adjacent real-desktop proof
- Uses deterministic local provider states; it does not enumerate every live provider failure shape.
- Does not call external providers.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Nix installed (already present) | passed | 0s | |
| Provider validation block verified | passed | 0s | |
| Provider failure recovery verified | passed | 0s |
Onboarding settings-contract real-Mac journey
onboarding_settings_contract_full_mac_journey
full-mac on macos-e2e (full-mac)
Real-Mac companion - Attached to Onboarding existing repo
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 178s
- Replay
tests/e2e/run.sh onboarding_settings_contract_full_mac_journey
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the shipped app on a real Mac from fresh state, records setup/onboarding, then seeds the same repo/host settings contract and verifies the app reaches the prompt screen.
- Fresh app-support state routes to setup/onboarding
- Temporary nix-darwin config repo is a real git repo with a host attr
- Persisted configDir and hostAttr route the released app to the prompt screen
- settings.json contains the selected configDir and hostAttr
- Publishes screenshots and a full-screen recording as adjacent real-desktop proof
- Intentionally settings-contract proof only: it does not drive the native file picker or claim full onboarding picker coverage.
- WDIO remains the deterministic authority for directory picker and host selection interaction.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Nix installed (already present) | passed | 0s | |
| Fresh onboarding screen verified | passed | 0s | |
| Repo/host settings contract verified | passed | 0s |
Live OpenRouter real-Mac journey
live_openrouter_full_mac_journey
full-mac on macos-e2e (full-mac)
Real-Mac companion - Attached to Live OpenRouter evolve smoke
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 411s
- Replay
tests/e2e/run.sh live_openrouter_full_mac_journey
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the shipped app on a real Mac with a real OpenRouter key, submits a constrained prompt, and verifies a live-model diff reaches review without applying.
- OpenRouter key/model are seeded into the released app settings on the Mac runner
- Prompt submission calls a live OpenRouter-backed provider path
- Live model/tool-call loop reaches evolve review
- Temporary config repo diff includes pkgs.jq in flake.nix
- Flow stops at review without applying or rebuilding
- Publishes screenshots and a full-screen recording as adjacent real-desktop proof
- Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
- Stops at evolve review; it does not apply, rebuild, or verify final macOS state.
Failure: Live OpenRouter diff did not contain pkgs.jq
- What happened
- Live OpenRouter diff did not contain pkgs.jq
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Nix installed (already present) | passed | 0s | |
| Live provider settings seeded | passed | 0s | |
| Live OpenRouter diff did not contain pkgs.jq | failed | 0s | Live OpenRouter diff did not contain pkgs.jq |
Product surface real-Mac smoke
product_surface_full_mac_smoke
full-mac on macos-e2e (full-mac)
Real-Mac companion - Attached to Settings and provider tabs, Feedback and issue reporting, History and settings navigation
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 383s
- Replay
tests/e2e/run.sh product_surface_full_mac_smoke
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the shipped app on a real Mac with Nix installed and records adjacent desktop proof for Settings, feedback, Report Issue, and History surfaces.
- App launches from /Applications on the Mac runner
- Settings opens and the General, AI Models, API Keys, and Preferences surfaces render
- Header feedback opens the feedback dialog
- Footer Report Issue opens issue-report mode
- History opens and returns to the main surface
- Publishes a full-screen recording as adjacent real-desktop proof
- Adjacent real-Mac smoke only; it does not replay each hosted WDIO assertion or verify settings persistence on disk.
- Uses the nix-installed fixture so the app is past the Nix installation prerequisite, but it does not apply or rebuild a nix-darwin configuration.
Failure: Settings tab did not render expected text: AI Models
- What happened
- Settings tab did not render expected text: AI Models
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Nix installed (already present) | passed | 0s | |
| Product surface launched | passed | 0s | |
| Settings tab did not render expected text: AI Models | failed | 0s | Settings tab did not render expected text: AI Models |
Release DMG launch smoke
release_dmg_app_translocation_smoke
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 119s
- Replay
tests/e2e/run.sh release_dmg_app_translocation_smoke
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Launches the installed /Applications app on a real Mac and verifies a usable first screen.
- Build artifact can be installed on the Mac runner
- App launches from /Applications
- First screen renders enough nixmac text to rule out startup/App Translocation crashes
- Publishes a full-screen recording as proof
- Launch smoke only; it does not exercise Nix installation, settings, or evolve/apply flows.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Clean machine ready | passed | 0s | |
| App installed at /Applications/nixmac.app | passed | 0s | |
| App launched | passed | 0s | |
| First screen rendered | passed | 0s |
macOS descriptor prompt smoke
macos_descriptor_prompt_smoke
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 490s
- Replay
tests/e2e/run.sh macos_descriptor_prompt_smoke
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Launches the real macOS app, types one descriptor into the main prompt, and verifies the expected local provider-validation block with 30 fps video proof.
- Exact-SHA app artifact launches on the real Mac runner
- Prompt input is reachable through stable accessibility metadata
- Descriptor text can be typed and observed in the real UI
- Submit affordance is reachable through stable accessibility metadata
- Local provider validation blocks submit without requiring a fragile mock provider
- Publishes a 30 fps full-screen recording as primary proof
- Intentionally stops at local provider validation; it does not call a live or mock AI provider yet.
- Requires the self-hosted Mac runner to already have Nix and darwin-rebuild available.
Failure: Typed descriptor was not visible in the prompt input
- What happened
- Typed descriptor was not visible in the prompt input
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Prepared config repo, mock system prerequisites, and local provider-validation settings | passed | 0s | |
| App launched | passed | 0s | |
| Descriptor prompt input reached | passed | 0s | |
| Typed descriptor was not visible in the prompt input | failed | 0s | Typed descriptor was not visible in the prompt input |
macOS provider evolve full smoke
macos_provider_evolve_full_smoke
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 213s
- Replay
tests/e2e/run.sh macos_provider_evolve_full_smoke
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Launches the installed macOS app, submits a descriptor, calls an OpenAI-compatible HTTP provider, applies the provider's Nix edit, runs the mocked host rebuild/activation step, generates the Save-step commit message through the provider, and commits the result.
- Exact-SHA app artifact launches on the real Mac runner
- Prompt input is reached and submitted through accessibility metadata
- Evolution provider receives a real HTTP chat completion request with tool schemas
- Provider tool calls edit flake.nix and run build_check through the backend
- Summary provider receives JSON completion requests for the generated diff
- Build & Test advances to Save using the explicit E2E mock-system activation path
- Commit-message provider receives a conventional-commit request and populates the Save step
- Save step commits the provider-generated message and returns to Describe
- Publishes a 30 fps full-screen recording as primary proof
- Uses a deterministic local OpenAI-compatible provider so the test is stable; it does not depend on external provider billing, latency, or model nondeterminism.
- Mocks only the host system rebuild/activation under NIXMAC_E2E_MOCK_SYSTEM=1, so it does not mutate the self-hosted runner's real macOS configuration.
Failure: Typed descriptor was not visible in the prompt input
- What happened
- Typed descriptor was not visible in the prompt input
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Prepared config repo, deterministic HTTP provider, completion logging, and mock rebuild flag | passed | 0s | |
| App launched | passed | 0s | |
| Typed descriptor was not visible in the prompt input | failed | 0s | Typed descriptor was not visible in the prompt input |
macOS live provider real system evolve
macos_live_provider_evolve_real_system
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 3358s
- Replay
tests/e2e/run.sh macos_live_provider_evolve_real_system
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Starts from a clean Mac fixture, installs Nix through the shipped app, submits a descriptor, calls the real OpenRouter provider, applies the provider's Nix edit, runs real nix-darwin build and activation, generates the Save-step commit message through the provider, commits the result, and restores or uninstalls the test system state.
- Clean-machine fixture is used so the lane can bootstrap Nix instead of assuming the runner already has it
- Install Nix flow runs through the shipped app before provider evolution begins
- Exact-SHA app artifact launches on the real Mac runner
- Prompt input is reached and submitted through accessibility metadata
- Real OpenRouter evolve provider receives the descriptor and returns tool calls
- Provider tool calls edit flake.nix and run real build_check through nix
- Build & Test runs real darwin-rebuild build and activation with macOS admin authentication
- System profile changes after activation, proving the mock-system path was not used
- Summary/commit provider completions are recorded from the real provider
- Save step commits the provider-generated message and returns to Describe
- Previous system profile is restored after the proof run when one existed, otherwise the test Nix install is removed
- Publishes a 30 fps full-screen recording as primary proof
- Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
- Runs on the configured full-Mac runner and mutates then restores or uninstalls that runner's real nix-darwin system state; it is intentionally not parallel-safe on one Mac.
- Uses live nix-darwin/nixpkgs inputs during the temporary fixture lock step, so upstream flakes can still affect runtime stability.
Failure: Real Build & Test did not advance to Save/commit step
- What happened
- Real Build & Test did not advance to Save/commit step
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Clean machine ready | passed | 0s | |
| Nix installed and detected through the shipped app | passed | 0s | |
| Prepared real OpenRouter settings, real nix-darwin flake, and preserved current system profile | passed | 0s | |
| App launched | passed | 0s | |
| Descriptor submitted | passed | 0s | |
| Live OpenRouter evolve provider edited flake.nix and reached Review | passed | 0s | |
| Real Build & Test did not advance to Save/commit step | failed | 0s | Real Build & Test did not advance to Save/commit step |
Install Nix on clean machine
install_nix_clean_machine
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 231s
- Replay
tests/e2e/run.sh install_nix_clean_machine
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Runs the real Install Nix flow on a clean Mac and verifies Nix works afterward.
- App launches into the Nix install flow
- Install Nix button can be clicked
- Determinate Nix package downloads and installs
- App detects Nix and prefetches darwin-rebuild
- Final Nix binary verification passes
- Publishes a full-screen recording as proof
- Runs on one configured Mac runner and macOS version; it does not cover every hardware or OS variant.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Clean machine ready | passed | 0s | |
| App launched | passed | 0s | |
| Install button clicked | passed | 0s | |
| Download complete | passed | 0s | |
| Nix installed | passed | 0s | |
| App detected Nix | passed | 0s | |
| Prefetch complete | passed | 0s | |
| All verifications passed | passed | 0s |







