nixmac E2E Report - macos_provider_evolve_full_smoke

View all scenarios

0 passed 1 assertion failed 0 infra/not-run 1/1 selected scenarios produced reports

macOS provider evolve full smoke

macos_provider_evolve_full_smoke

full-mac on macos-e2e (full-mac)

Full-Mac recording - Standalone real desktop evidence

failed
Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64
Duration
213s
Replay
tests/e2e/run.sh macos_provider_evolve_full_smoke

Full-Mac lane: real macOS desktop automation with full-screen recording evidence.

What this checks

Launches the installed macOS app, submits a descriptor, calls an OpenAI-compatible HTTP provider, applies the provider's Nix edit, runs the mocked host rebuild/activation step, generates the Save-step commit message through the provider, and commits the result.

Coverage
  • Exact-SHA app artifact launches on the real Mac runner
  • Prompt input is reached and submitted through accessibility metadata
  • Evolution provider receives a real HTTP chat completion request with tool schemas
  • Provider tool calls edit flake.nix and run build_check through the backend
  • Summary provider receives JSON completion requests for the generated diff
  • Build & Test advances to Save using the explicit E2E mock-system activation path
  • Commit-message provider receives a conventional-commit request and populates the Save step
  • Save step commits the provider-generated message and returns to Describe
  • Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
  • Uses a deterministic local OpenAI-compatible provider so the test is stable; it does not depend on external provider billing, latency, or model nondeterminism.
  • Mocks only the host system rebuild/activation under NIXMAC_E2E_MOCK_SYSTEM=1, so it does not mutate the self-hosted runner's real macOS configuration.

Failure: Typed descriptor was not visible in the prompt input

What happened
Typed descriptor was not visible in the prompt input
Next action
Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
failure-1777490099-1777490099.png
PhaseStatusDurationSummary
Prepared config repo, deterministic HTTP provider, completion logging, and mock rebuild flag passed 0s
App launched passed 0s
Typed descriptor was not visible in the prompt input failed 0s Typed descriptor was not visible in the prompt input