nixmac E2E Report - macos_provider_evolve_full_smoke
0 passed
1 assertion failed
0 infra/not-run
1/1 selected scenarios produced reports
macOS provider evolve full smoke
macos_provider_evolve_full_smoke
full-mac on macos-e2e (full-mac)
Full-Mac recording - Standalone real desktop evidence
- Commit
33f37e696bb5d5f1217b458e722d756ffb3d1c64- Duration
- 213s
- Replay
tests/e2e/run.sh macos_provider_evolve_full_smoke
Full-Mac lane: real macOS desktop automation with full-screen recording evidence.
What this checks
Launches the installed macOS app, submits a descriptor, calls an OpenAI-compatible HTTP provider, applies the provider's Nix edit, runs the mocked host rebuild/activation step, generates the Save-step commit message through the provider, and commits the result.
Coverage
- Exact-SHA app artifact launches on the real Mac runner
- Prompt input is reached and submitted through accessibility metadata
- Evolution provider receives a real HTTP chat completion request with tool schemas
- Provider tool calls edit flake.nix and run build_check through the backend
- Summary provider receives JSON completion requests for the generated diff
- Build & Test advances to Save using the explicit E2E mock-system activation path
- Commit-message provider receives a conventional-commit request and populates the Save step
- Save step commits the provider-generated message and returns to Describe
- Publishes a 30 fps full-screen recording as primary proof
Known gaps / not covered
- Uses a deterministic local OpenAI-compatible provider so the test is stable; it does not depend on external provider billing, latency, or model nondeterminism.
- Mocks only the host system rebuild/activation under NIXMAC_E2E_MOCK_SYSTEM=1, so it does not mutate the self-hosted runner's real macOS configuration.
Failure: Typed descriptor was not visible in the prompt input
- What happened
- Typed descriptor was not visible in the prompt input
- Next action
- Open the full report and workflow logs for the failing phase, then rerun the replay command after fixing the cause.
| Phase | Status | Duration | Summary |
|---|---|---|---|
| Prepared config repo, deterministic HTTP provider, completion logging, and mock rebuild flag | passed | 0s | |
| App launched | passed | 0s | |
| Typed descriptor was not visible in the prompt input | failed | 0s | Typed descriptor was not visible in the prompt input |