nixmac E2E Report - macos_live_provider_evolve_real_system

View all scenarios

0 passed 1 assertion failed 0 infra/not-run 1/1 selected scenarios produced reports

macOS live provider real system evolve

macos_live_provider_evolve_real_system

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
3358s
Replay
tests/e2e/run.sh macos_live_provider_evolve_real_system

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Starts from a clean Mac fixture, installs Nix through the shipped app, submits a descriptor, calls the real OpenRouter provider, applies the provider's Nix edit, runs real nix-darwin build and activation, generates the Save-step commit message through the provider, commits the result, and restores or uninstalls the test system state.

Coverage
  • Clean-machine fixture is used so the lane can bootstrap Nix instead of assuming the runner already has it
  • Install Nix flow runs through the shipped app before provider evolution begins
  • Exact-SHA app artifact launches on the real Mac runner
  • Prompt input is reached and submitted through accessibility metadata
  • Real OpenRouter evolve provider receives the descriptor and returns tool calls
  • Provider tool calls edit flake.nix and run real build_check through nix
  • Build & Test runs real darwin-rebuild build and activation with macOS admin authentication
  • System profile changes after activation, proving the mock-system path was not used
  • Summary/commit provider completions are recorded from the real provider
  • Save step commits the provider-generated message and returns to Describe
  • Previous system profile is restored after the proof run when one existed, otherwise the test Nix install is removed
  • Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
  • Calls a live model and can fail for provider outages, rate limits, account credit, or prompt nondeterminism.
  • Runs on the configured full-Mac runner and mutates then restores or uninstalls that runner's real nix-darwin system state; it is intentionally not parallel-safe on one Mac.
  • Uses live nix-darwin/nixpkgs inputs during the temporary fixture lock step, so upstream flakes can still affect runtime stability.

Failure: Real Build & Test did not advance to Save/commit step

What happened
Real Build & Test did not advance to Save/commit step
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777493442-1777493442.png
PhaseStatusDurationSummary
Clean machine ready passed 0s
Nix installed and detected through the shipped app passed 0s
Prepared real OpenRouter settings, real nix-darwin flake, and preserved current system profile passed 0s
App launched passed 0s
Descriptor submitted passed 0s
Live OpenRouter evolve provider edited flake.nix and reached Review passed 0s
Real Build & Test did not advance to Save/commit step failed 0s Real Build & Test did not advance to Save/commit step