nixmac E2E Report - macos_live_provider_evolve_real_system
0 passed
1 assertion failed
0 infra/not-run
1/1 selected scenarios produced reports
macOS live provider real system evolve
macos_live_provider_evolve_real_system
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 3358s
- Replay
tests/e2e/run.sh macos_live_provider_evolve_real_system
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Starts from a clean Mac fixture, installs Nix through the shipped app, submits a descriptor, calls the real OpenRouter provider, applies the provider's Nix edit, runs real nix-darwin build and activation, generates the Save-step commit message through the provider, commits the result, and restores or uninstalls the test system state.
Coverage
- Clean-machine fixture is used so the lane can bootstrap Nix instead of assuming the runner already has it
- Install Nix flow runs through the shipped app before provider evolution begins
- Exact-SHA app artifact launches on the real Mac runner
- Prompt input is reached and submitted through accessibility metadata
- Real OpenRouter evolve provider receives the descriptor and returns tool calls
- Provider tool calls edit flake.nix and run real build_check through nix
- Build & Test runs real darwin-rebuild build and activation with macOS admin authentication
- System profile changes after activation, proving the mock-system path was not used
- Summary/commit provider completions are recorded from the real provider
- Save step commits the provider-generated message and returns to Describe
- Previous system profile is restored after the proof run when one existed, otherwise the test Nix install is removed
- Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
- Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
- Runs on the configured full-Mac runner and mutates then restores or uninstalls that runner's real nix-darwin system state; it is intentionally not parallel-safe on one Mac.
- Uses live nix-darwin/nixpkgs inputs during the temporary fixture lock step, so upstream flakes can still affect runtime stability.
Failure: Real Build & Test did not advance to Save/commit step
- What happened
- Real Build & Test did not advance to Save/commit step
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Clean machine ready | passed | 0s | |
| Nix installed and detected through the shipped app | passed | 0s | |
| Prepared real OpenRouter settings, real nix-darwin flake, and preserved current system profile | passed | 0s | |
| App launched | passed | 0s | |
| Descriptor submitted | passed | 0s | |
| Live OpenRouter evolve provider edited flake.nix and reached Review | passed | 0s | |
| Real Build & Test did not advance to Save/commit step | failed | 0s | Real Build & Test did not advance to Save/commit step |