rdp-proxy/CODEX_CONTEXT.md

# CODEX CONTEXT

## Project identity

This project is a production-grade distributed secure access platform.

It started as a custom RDP proxy with persistent server-side sessions, but the final target architecture is broader:

- distributed secure access fabric
- multi-tenant platform
- session broker for GUI and future non-GUI protocols
- cluster mesh of nodes
- connector/VPN layer
- customer-managed and platform-managed nodes
- node-agent based self-update / rollback / health supervision

## Product architecture rule: VPN and Remote Workspace are separate products/layers

Do not merge VPN/IP tunnel work with Remote Workspace / remote desktop work.

- VPN is a universal network-layer IP tunnel. It carries any traffic generated
  by a phone, Windows PC, Linux host, or other client device: HTTP, DNS, ping,
  RDP clients, SSH clients, SMB, business apps, and future protocols. VPN must
  stay protocol-agnostic and must not contain remote-desktop-specific logic.
- Remote Workspace is an application/session-layer service. The client talks to
  RAP using RAP's own client protocol. RAP workers/connectors then talk to the
  target server using protocol adapters such as RDP, SSH, VNC, or future
  adapters, convert screen/input/clipboard/files/audio/control into RAP's
  format, and render it in the RAP client.
- VPN optimization work must focus on generic data-plane transport,
  full-tunnel/split-tunnel routing, DNS, MTU/MSS, QoS, NAT traversal, direct
  UDP/QUIC transport, fallback relay, diagnostics, and stability for arbitrary
  traffic.
- Remote Workspace optimization work must focus on server catalog, session
  broker, workers/connectors, protocol adapters, RAP client protocol, separate
  connection windows, rendering/input/clipboard/file/audio behavior, and
  user-facing remote-workspace UX.
- Both VPN and Remote Workspace must consume the shared Fabric Service Channel
  runtime. Control/API traffic may use backend/admin ingress, but working
  service data must use the fabric channel whenever available. Backend relay is
  a compatibility/degraded fallback, not the production steady-state.
- The accepted service-channel direction is documented in
  `docs/architecture/FABRIC_SERVICE_CHANNEL_RUNTIME.md`: a service requests a
  channel with entry pool, exit pool, roles, service class, channel classes,
  QoS and failover policy; the fabric selects the fastest healthy route and
  rebuilds it on failure. Protocol-specific services must not reimplement this
  transport.
- Current implementation: backend issues `rap.fabric_service_channel_lease.v1`
  leases and embeds them in VPN client profiles. Leases include
  cluster-authority-signed `rap.fabric_service_channel_lease_authority.v1`
  payloads that bind token hash, selected route, generation, fencing epoch, and
  expiry, plus a signed `data_plane` contract declaring that working data uses
  the Fabric Service Channel over fabric routes while backend relay is only an
  explicit degraded/disabled fallback policy. `rap-node-agent` accepts the
  first VPN packet service-channel entry
  endpoint under
  `/api/v1/clusters/{cluster_id}/fabric/service-channels/{channel_id}/vpn-connections/{resource_id}/packets`
  plus `/packets/ws`. The endpoint validates the signed or introspected
  data-plane contract, applies the preferred fabric route, uses the existing
  production `vpn_packet` fabric route, reports contract adoption in heartbeat
  access telemetry, and refuses backend relay when the contract disables it.
  Backend access telemetry and web-admin now show data-plane adoption,
  working/steady-state transport, backend relay policy, data-plane mode, and
  logical flow mode at cluster/node/channel levels. The next slice is explicit
  route/fallback violation incidents from that telemetry, plus client
  consumption of the lease endpoint template.

## Current proven foundation

The current codebase already proved the most risky low-level lifecycle assumptions for RDP:

- real FreeRDP connect works
- session state transitions to active work
- terminate works
- detach works without killing the remote session
- reattach works without recreating the remote session
- takeover works without recreating the remote session
- per-resource certificate verification policy exists
- `certificate_verification_mode = strict | ignore`
- `strict` is default
- `ignore` works on a per-resource basis
- worker build is reproducible
- backend build is reproducible

This proven lifecycle must NOT be broken by future architecture work.

## Current architecture baseline

Current audit and baseline snapshot:

- `docs/audits/PROJECT_AUDIT_2026-04-26.md`
- `docs/audits/CURRENT_BASELINE_MATRIX.md`

### Test environment
- Canonical test Docker host: `192.168.200.61`
- Canonical Docker context: `test-ubuntu`
- Canonical SSH alias: `docker-test`
- Current external control-plane endpoint for remote/offsite node enrollment:
  `http://94.141.118.222:19191` / `http://vpn.cin.su:19191`.
- Current port forward: `94.141.118.222:19191` -> `192.168.200.61:18080`.
- For offsite Windows/Linux nodes, install profiles should use:
  `http://vpn.cin.su:19191/api/v1` as control-plane endpoint and
  `http://vpn.cin.su:19191/downloads` as artifact endpoint unless the user
  explicitly chooses the raw IP endpoint.
- Backend API for local/client smoke runs: `http://192.168.200.61:8080/api/v1`
- WebSocket gateway for local/client smoke runs: `ws://192.168.200.61:8080/api/v1/gateway/ws`
- Stage C17 planning is completed.
- C17A synthetic mesh runtime skeleton is implemented and test-proven in
  `rap-node-agent` only. It is disabled by default and carries synthetic
  `fabric.probe` / `fabric.probe_ack` messages only.
- C17B route health and failover probes are implemented and test-proven in
  `rap-node-agent` only. They are disabled by default and carry synthetic
  `fabric.route_health` / `fabric.route_health_ack` messages only.
- C17C relay semantic hardening is implemented and test-proven in
  `rap-node-agent` only. It is disabled by default and models synthetic
  per-channel queues/QoS/backpressure only.
- C17D non-production test-service path is implemented and test-proven in
  `rap-node-agent` only. It is disabled by default and carries only bounded
  `synthetic.echo` test payloads.
- C17E/C17F/C17G are implemented and proven for live synthetic HTTP transport,
  scoped synthetic route config, and Control Plane scoped synthetic config
  consumption.
- C17H deployed multi-agent synthetic config smoke is runtime-proven on
  `docker-test`: five running `rap-node-agent` containers consume
  backend-issued node-scoped synthetic config, direct and single-relay
  synthetic route-health observations return to the Control Plane, and
  production forwarding remains disabled.
- C17I production forwarding gate foundation is implemented and test-proven:
  `rap-node-agent` has an explicit production-forwarding gate, while
  `/mesh/v1/forward` still refuses production payload forwarding until a later
  approved runtime stage.
- C17J production envelope contract is implemented and test-proven:
  `/mesh/v1/forward` validates route-bound production envelopes for
  `fabric_control` / `fabric.control` only when the gate is enabled, rejects
  service channels, and still refuses production forwarding.
- C17K production envelope observation is implemented and test-proven:
  valid accepted envelopes can be observed locally as metadata-only records
  after validation; rejected envelopes are not observed, observation failure
  fails closed, and production forwarding remains unavailable.
- C17L bounded production observation sink is implemented and test-proven:
  accepted metadata-only observations can be retained locally with fixed
  capacity, oldest-entry drop behavior, and no payload body storage.
- C17M production observation sink wiring is implemented and test-proven:
  node-agent can wire the bounded local metadata-only sink when
  `RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY` is explicitly greater than
  zero; the wiring is disabled by default and exposes no read API.
- C17N production observation sink metrics are implemented and test-proven:
  local sink metrics expose only capacity, current depth, accepted total, and
  dropped-oldest total; they expose no observation records or payload metadata.
- C17O production observation sink local metrics logging is implemented and
  test-proven: node-agent logs aggregate sink metrics locally when the sink is
  explicitly enabled; no read API or Control Plane reporting is added.
- C17P production observation sink change-driven metrics logging is implemented
  and test-proven: node-agent suppresses repeated identical local sink metrics
  logs; no read API or Control Plane reporting is added.
- C17Q production forwarding gate/runtime log boundary is implemented and
  test-proven: node-agent logs production forwarding gate state separately from
  production forwarding runtime state. Runtime state remained false until
  C17Z introduced gate-controlled `fabric.control` direct forwarding.
- C17R production observation sink capacity guard is implemented and
  test-proven: `RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY` is rejected
  above `10000`.
- C17S production observation panic fail-closed hardening is implemented and
  test-proven: observer errors and observer panics both fail closed as
  observation failure.
- C17T production envelope payload boundary is implemented and test-proven:
  validated production `fabric.control` envelope payloads are bounded to
  `4096` bytes and oversized envelopes are rejected before observation.
- C17U production envelope created-at skew boundary is implemented and
  test-proven: validated production `fabric.control` envelopes whose
  `created_at` is more than one minute in the future are rejected before
  observation.
- C17V peer endpoint candidate model is implemented and test-proven:
  node-scoped synthetic mesh config now carries route-scoped endpoint
  candidates with transport, address, reachability, NAT type, connectivity
  mode, priority, policy tags, verification time, and metadata. This is a
  model/config boundary only; no production route scoring, NAT traversal,
  shortcut routing, or forwarding runtime is implemented.
- C17W peer endpoint candidate scoring model is implemented and test-proven:
  `rap-node-agent` can rank already-scoped endpoint candidates using soft
  inputs such as transport, reachability, connectivity mode, NAT type,
  priority, region, policy tags, channel class, and verification age. This is
  a scoring helper only; it does not open connections, choose production
  routes, or forward payloads.
- C17X health-aware endpoint candidate scoring overlay is implemented and
  test-proven: endpoint candidate scoring can optionally use local health
  observations keyed by `endpoint_id`, including latency, success/failure
  history, recent failure reason, reliability score, and observation freshness.
  This remains advisory scoring only and is not wired into production route
  execution.
- C17Y Platform Owner synthetic mesh visibility is implemented and
  build/test-proven: `web-admin` reads node-scoped synthetic mesh config and
  shows config enabled state, route counts, peer endpoints, endpoint
  candidates, C17X advisory scoring boundary, and `production_forwarding`.
  This remains platform-owner visibility only and does not enable production
  forwarding.
- C17Z production fabric-control direct forwarding boundary is implemented and
  test-proven: when `RAP_MESH_PRODUCTION_FORWARDING_ENABLED=true`,
  `/mesh/v1/forward` can deliver valid route-bound `fabric.control` envelopes
  at the local destination or forward them to a direct next hop from explicit
  peer endpoint config. Service channels, arbitrary relay forwarding,
  multi-hop production route execution, and RDP/VPN/file/video/service payloads
  remain unavailable.
- C17Z1 production fabric-control multi-hop route-path boundary is implemented
  and test-proven: production `fabric.control` envelopes can carry
  `route_path` and `visited_node_ids`; relay nodes validate path position,
  forward only to the next path node, update TTL/hop/visited metadata, and
  reject loops. Service payloads remain unavailable.
- C17Z2 production fabric-control forwarding observability boundary is
  implemented and test-proven: node-agent emits local
  `mesh_production_forward_event` logs for accepted, forwarded, delivered, and
  rejected production `fabric.control` envelopes. Logs are metadata-only and
  include no payload bodies or read API.
- C17Z3 production fabric-control route-config boundary is implemented and
  test-proven: when scoped/control-plane mesh routes are available locally,
  production `fabric.control` envelopes must match configured route_id/path/
  next-hop/channel/expiry/TTL/hop limits before forwarding.
- C17Z4 scoped peer directory and recovery seeds boundary is implemented and
  test/build-proven: node-scoped mesh config carries scoped `peer_directory`
  and explicit bounded `recovery_seeds`; node-agent parses/validates them and
  web-admin shows counts.
- C17Z5 node-agent peer cache runtime boundary is implemented and test-proven:
  node-agent builds a local `PeerCache`, selects bounded warm peers, probes warm
  peers with `/mesh/v1/health`, and reports metadata-only mesh-link
  observations when synthetic mesh testing is enabled.
- C17Z6 dynamic endpoint reporting boundary is implemented and test-proven:
  node-agent reports explicit advertised mesh endpoint metadata in heartbeat,
  and Control Plane projects latest reported endpoints/candidates into
  node-scoped synthetic mesh config.
- C17Z7 private/corporate endpoint candidate boundary is implemented and
  test-proven: node-agent reports multiple advertised endpoint candidates,
  scoring rewards private/corporate same-site candidates, and peer cache can
  use the best candidate address for warm health.
- C17Z8 peer connection state machine boundary is implemented and test-proven:
  node-agent tracks warm-peer states `disconnected`, `connecting`, `ready`,
  `degraded`, and `backoff`, with bounded backoff after repeated health probe
  failures.
- C17Z9 peer recovery planner boundary is implemented and test-proven:
  node-agent targets a bounded stable ready-peer set, enters recovery when
  ready peers fall below target, and selects bounded recovery probes from warm
  peers, recovery seeds, and other connectable scoped peers.
- C17Z10 peer connection intent planner boundary is implemented and
  test-proven: node-agent classifies bounded peer work as maintain/probe/
  recover and classifies transport readiness as direct/private_lan/
  corporate_lan/outbound_only/relay_required, with rendezvous-required
  metadata only.
- C17Z11 peer connection manager runtime boundary is implemented and
  test-proven: node-agent uses a reusable HTTP keep-alive client for real
  control-plane health probes of direct/private/corporate peers and records
  `waiting_rendezvous` for outbound-only/relay-required peers.
- C17Z12 rendezvous/relay control-plane contract is implemented and
  docker-test-runtime-proven: backend issues node-scoped `rendezvous_leases`,
  node-agent resolves matching `waiting_rendezvous` intents into
  `relay_control`, probes relay `/mesh/v1/health`, records and maintains
  `relay_ready`, and keeps service payload forwarding disabled.
- C17Z13 rendezvous lease telemetry is implemented and
  docker-test-runtime-proven: node-agent reports
  `mesh_rendezvous_lease_report` with relay admission, peer admission,
  TTL/renewal posture, `relay_ready`, and explicit no-payload boundary flags;
  web-admin shows `rv leases` in recent heartbeat tables.
- C17Z14 rendezvous lease refresh contract is implemented and
  docker-test-runtime-proven: node-agent refreshes renewal-needed/stale
  rendezvous leases through node-scoped synthetic config reload, updates the
  running peer cache/route/lease state, and reports refresh plus stale relay
  withdrawal/reselection telemetry. Service payload forwarding remains
  unavailable.
- C17Z15 backend relay replacement policy is implemented and
  docker-test-runtime-proven: backend consumes recent stale-relay heartbeat
  feedback, withdraws stale explicit rendezvous leases, scores alternate relay
  candidates from route adjacency, endpoint priority, policy tags, and recent
  mesh-link health, and returns replacement leases plus
  `rendezvous_relay_policy` decisions in node-scoped synthetic config.
  Node-agent reports `c17z15.mesh_rendezvous_lease_report.v1` and keeps stale
  state scoped to the exact lease/relay, so replacement leases for the same
  peer are not marked stale by association. Service payload forwarding remains
  unavailable.
- C17Z16 route/path decision artifact is implemented and
  docker-test-runtime-proven: backend `c17z16.synthetic.v1` config includes
  `route_path_decisions` with original hops, effective hops, local previous/
  next hop, selected replacement relay, generation, score reasons, and
  no-payload boundary flags. Node-agent stores the control-plane route
  generation and reports `c17z16.mesh_route_path_decision_report.v1` plus
  `c17z16.mesh_rendezvous_lease_report.v1`. Service payload forwarding remains
  unavailable.
- C17Z17 node-side route generation tracker is implemented and
  docker-test-runtime-proven: backend `c17z17.synthetic.v1` config and
  node-agent `mesh_route_generation_report` track active/applied/unchanged/
  withdrawn route decisions, generation changes, total counters, and
  `withdrawn_by_replacement` records for stale relay paths when replacement is
  first observed. Service payload forwarding remains unavailable.
- C17Z18 synthetic route-health effective path runtime is implemented and
  docker-test-runtime-proven: backend `c17z18.synthetic.v1` config and
  node-agent `mesh_route_health_config_report` apply Control Plane
  `route_path_decisions` to synthetic route-health route config only. The
  synthetic runtime probes selected effective paths through replacement relays,
  reports expected/observed hops and drift state, and backend latest mesh links
  preserve route-health observations separately from connection-manager
  observations. Service payload forwarding remains unavailable.
- C17Z19 synthetic route-health feedback scoring is implemented and
  docker-test-runtime-proven: backend consumes recent `synthetic_route_health`
  observations in relay scoring, uses drift/unreachable/failure metadata to
  mark the exact selected relay stale, boosts healthy low-latency relay
  candidates, and returns replacement leases/route decisions through the
  existing synthetic config contract. Migration `000022` adds the `synthetic`
  mesh service class. Service payload forwarding remains unavailable.
- C17Z20 node-side route-health feedback refresh is implemented and
  docker-test-runtime-proven: after reporting synthetic route-health
  drift/unreachable/failure, node-agent performs a bounded node-scoped
  synthetic-config refresh, applies returned replacement route decisions to
  route-health config immediately, and reports
  `c17z20.mesh_route_health_feedback_refresh_report.v1`. Service payload
  forwarding remains unavailable.
- C17Z21 offsite control-plane bootstrap relay and Windows updater foundation
  are implemented and docker-test/runtime-proven: backend exposes
  `/mesh/v1/health` through the admin/nginx control-plane origin and issues
  control-plane-only bootstrap rendezvous leases for outbound-only nodes using
  their reported public control-plane URL. Remote Windows node
  `ifcm-rufms-s-mo1cr` resolved 3/3 peers to `relay_ready` through
  `http://94.141.118.222:19191`, while service/RDP/VPN payload forwarding
  remains disabled. Release `0.1.3` is published for Docker and Windows
  `windows_service` artifacts, and `install-windows` now installs a
  per-node Scheduled Task updater for future Windows node-agent updates.
- C17Z22 updater observability and Windows host-agent self-update staging are
  implemented and test-proven: `rap-host-agent` reports `phase=plan`,
  `status=noop` for already-current/no-op plans, update state is scoped per
  product so `rap-node-agent` and `rap-host-agent` do not overwrite each
  other's current version, and the Windows updater wrapper runs short
  one-shot cycles that can apply staged `rap-host-agent.exe.next` before the
  next update check. Release `rap-host-agent 0.1.3` is published for
  `linux_binary` and `windows_binary`; Docker updater containers on
  `test-1/2/3` report no-op plans.
- Installation Authority foundation is implemented: production requires strict
  Product Root public key config, first-owner bootstrap uses signed Ed25519
  activation manifests, `installation_authority` and signed
  `platform_role_grants` are persisted, and strict platform-admin checks ignore
  direct `users.platform_role` database edits without a valid signed grant.
  Web-admin exposes installation status/first-owner bootstrap, and
  `scripts/installation/product-root-tool.go` generates keys/manifests for
  offline product-root operations.
- Cluster Authority and node enrollment bootstrap are docker-test lifecycle
  smoke-proven in run `dev-bootstrap-20260428-201430`: a fresh dev install
  bootstrapped the first owner, created a cluster, issued a signed join token,
  accepted real `rap-node-agent` enrollment, owner-approved the join request,
  agent-polled signed bootstrap, persisted cluster authority pin, heartbeated,
  and verified signed `c17z18.synthetic.v1` Control Plane config. Production
  service payload forwarding remains unavailable.
- Migration `000021_cluster_authority_keys` drops/recreates
  `cluster_admin_summaries` because fresh replay proved PostgreSQL cannot
  change that view layout via `CREATE OR REPLACE VIEW`.
- `rap-node-agent` desired-workload polling/status reporting is gated by
  `RAP_WORKLOAD_SUPERVISION_ENABLED=false` by default while service runtime
  supervision remains a stub.
- C18 VPN/IP tunnel service target design is completed as documentation only.
- C18A VPN/IP tunnel control-plane data model foundation is implemented and
  backend-test-proven.
- C18B VPN/IP tunnel lease/fencing hardening is implemented and
  backend-test-proven.
- C18C VPN/IP tunnel node-agent desired-state consumption/reporting is
  implemented and backend-test-proven.
- No next platform-core implementation step is automatically authorized after
  C17Z20. The next mesh layer should stay limited to route-health feedback
  refresh dampening/no-change cooldown unless the user explicitly chooses
  another staged task.
- Latest RDP performance reference image:
  `rap-rdp-worker:rdp-perf6-dirty-region`
- Stage 5.2 file-download runtime artifacts remain preserved for when RDP work
  resumes, but they are not the active next task.
- Do not use `docker.cin.su` for this project unless explicitly requested for a separate one-off check.

### Backend
- Go
- PostgreSQL = source of truth
- Redis = live coordination / routing only
- REST for control plane
- WebSocket for live session channel

### Worker
- C++ worker
- FreeRDP integration
- worker runtime hides FreeRDP details from backend
- The C++ worker remains the primary RDP runtime.
- Target RDP performance direction: `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md`.
- The RDP performance rewrite scope is limited to C++ RDP service adapter
  internals. It must not redesign backend control plane, cluster transport,
  organizations, leases, or session lifecycle.
- The C# RDP service skeleton is inactive research scaffolding and is not the
  current runtime direction.
- Current RDP Adapter baseline: RDP-Perf-6 dirty-region direct binary rendering
  is completed and smoke-proven on `docker-test`. RDP work is paused by product
  decision; next active work is Fabric Core / cluster foundation.
- P3/P3.1 security-readiness foundation exists: production mode rejects
  plaintext credential-like resource metadata, requires `secret_ref` for
  RDP/VNC/SSH resources, and has an encrypted PostgreSQL-backed resource secret
  storage/resolver MVP. P3.2 direct-worker TLS/PKI guard exists.
- P3.3 production-like test-stand smoke is complete on `docker-test`: backend
  runs in `APP_ENV=production` with a test-only secret key file, a secret-backed
  RDP resource starts real sessions through the resolver path, metadata/audit do
  not contain plaintext credentials, and backend gateway fallback remains
  available when direct worker WSS trust is `smoke_insecure`.
- P3.4 production direct-worker WSS trust model is documented in
  `docs/architecture/PRODUCTION_DIRECT_WORKER_WSS_TRUST.md`; it defines
  platform CA/public CA behavior, worker certificate SAN/identity requirements,
  app-local Windows trust direction, rotation/revocation, and the future
  `platform_ca` smoke plan. No RDP runtime behavior changed in P3.4.
- P3.5 app-local platform CA trust is implemented and runtime-proven on
  `docker-test`: Windows client validates direct worker WSS with an app-local
  platform CA bundle, keeps hostname/SAN validation enabled, selects
  `direct_worker_wss` without insecure TLS bypass, and falls back to backend
  gateway for unknown CA / smoke-only production cases.
- P3.6 stale Redis worker/live event idempotency is implemented and
  runtime-proven: stale worker events for terminal PostgreSQL sessions are
  ignored, backend restart survives stale Redis events, and terminal sessions
  are not reopened.
- Stage 5.2 server-to-client file download core data path is runtime-proven:
  direct worker WSS and backend gateway fallback both download text/binary
  files from `RAP_Transfers\ToClient` with matching size/hash, and direct
  policy blocking is proven for `disabled` and `client_to_server`. Lifecycle
  blocking is also runtime-proven for detach, old-client takeover, and worker
  failure. Runtime report:
  `artifacts/stage5-2-file-download-runtime-report.md`.
- Stage 5.2 is not fully accepted yet. Remaining proof: Windows desktop UI
  download path and regression matrix for rendering/input/clipboard/upload/
  reconnect/takeover.

### Clients
- future native clients:
  - Windows: native desktop client first
  - Linux: native desktop client later
- web UI is admin/control plane, not the primary power-user client

## Final architecture direction

The long-term target architecture is documented in:

- `docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md`
- `docs/architecture/CLUSTER_NODE_ADMIN_FOUNDATION.md`
- `docs/architecture/WEB_INGRESS_AND_ADMIN_UI_MODEL.md`

This document defines the target Secure Access Fabric architecture only. It is not the current implementation scope and must not be used as permission to start mesh, VPN, multi-cluster, updater, or realtime data-plane migration work without an explicit staged prompt.

`CLUSTER_NODE_ADMIN_FOUNDATION.md` defines the next platform-core planning
baseline for clusters, node enrollment, native node-agent identity, platform
admin console, multi-cluster administration, and future organization admin
visibility. It is a staged foundation document, not permission to implement
mesh packet routing or VPN runtime.

`WEB_INGRESS_AND_ADMIN_UI_MODEL.md` defines WEB as HTTP/HTTPS ingress and
Admin UI presentation only. Cluster configuration remains Control Plane
ownership through scoped APIs, PostgreSQL source-of-truth mutations, and audit.
Dynamic pages must be safe schema-driven projections and must not embed
internal topology, peer caches, route caches, secrets, raw credentials, or
arbitrary executable code.

Admin endpoint placement is explicit. Fabric Storage / Config Storage nodes do
not automatically host or move the cluster panel. Platform Owner Console
remains global platform-owner scope. Cluster Admin Endpoint requires explicit
admin/web ingress role assignment, cluster health/trust readiness, and Control
Plane authorization. Organization Admin Panel remains a tenant-safe projection.

The final platform must support:

1. Multi-tenancy / Organizations
- platform has many organizations
- each organization has isolated users, groups, resources, policies, audit, connectors
- users may belong to multiple organizations
- organization admins only see their organization
- platform admins see platform scope

2. Identity federation
- local users
- LDAP / Active Directory
- OIDC
- future extensibility for more identity sources
- access mappings based on external groups / claims

3. Cluster of nodes
- no mandatory single central node
- many nodes across many sites
- nodes can be platform-managed or customer-managed
- customer-managed nodes are sandboxed cluster participants, not full cluster owners

4. Node agent
- small stable always-running agent on every node
- supervises services
- downloads updates
- verifies signed artifacts
- can rollback to previous version
- can restart crashed services
- can work on thin or thick nodes

5. Service-based node model
Each node is not monolithic.
A node has:
- capabilities: what it can do physically/technically
- enabled services: what it is allowed/assigned to do

Possible services include:
- ingress-gateway
- mesh-router
- relay
- connector-host
- vpn-adapter
- session-worker
- media-relay
- file-relay
- update-cache
- config-replica
- audit-sink
- metrics-exporter

6. Cluster mesh and routing
- encrypted inter-node communication
- dynamic topology
- no need for full mesh
- multi-hop routing allowed
- route failover
- client failover between ingress nodes
- connector failover between nodes

7. Split-brain prevention
- quorum-based cluster behavior
- minority partition must not become a second authoritative cluster
- degraded / recovery / isolated modes
- manual recovery / promote decision by platform recovery admin

8. Connector / VPN layer
- connectors are reusable network access methods
- one connector may be used by multiple resources
- connector placement and failover are controlled by policy
- nodes may be allowed or disallowed to host connectors
- direct access, VPN, relay and future egress modes must fit this model

9. Future exit mode
- split tunnel
- full tunnel
- internet access through cluster
- not first implementation priority

## Non-negotiable design rules

- Do not rewrite proven session lifecycle carelessly.
- Do not turn Redis into a source of truth.
- Do not make certificate-ignore a global worker setting.
- Do not make customer-managed nodes platform-wide trusted by default.
- Do not create a separate cluster per organization.
- Do not assume a single permanently reachable central node.
- Do not rely on “secret protocol with no docs” as security.
- Security must come from crypto, auth, isolation, policy and observability.
- Prefer incremental evolution from current proven system.
- Do not collapse platform control plane and data plane into one vague layer.

## Implementation strategy

The codebase must evolve in phases.

Current implementation focus remains:
- RDP work is paused by product decision
- preserve the accepted RDP Adapter baseline and Stage 5.x file-transfer work
- do not delete or rewrite the current RDP MVP while platform-core work starts
- C1-C9 platform-core foundations are implemented and verified: clusters,
  node enrollment, node-agent scaffold, platform admin console, workload
  supervision contract, mesh control-plane prep, mesh skeleton, multi-cluster
  hardening, and organization admin foundation
- C10 Fabric Core configuration distribution design is completed
- C11 signed scoped cluster snapshot model is completed
- C12 node local state store is completed
- C13 Fabric Storage / Config Storage service foundation is completed
- C14 peer directory and cache model is completed
- C15 Fabric Routing Engine skeleton is completed
- C16 secure node-to-node channel lifecycle is completed
- C17 mesh routing runtime implementation plan is completed
- C17A synthetic mesh runtime skeleton is implemented and test-proven with
  synthetic fabric messages only, no RDP/VPN/production service traffic
- C17B route health and failover probes are implemented and test-proven with
  synthetic traffic only, no RDP/VPN/production service traffic
- C17C relay semantic hardening is implemented and test-proven with synthetic
  channel classes only, no RDP/VPN/production service traffic
- C17D non-production test-service path is implemented and test-proven with
  bounded `synthetic.echo` traffic only, no RDP/VPN/production service traffic
- C17E live node-to-node synthetic HTTP transport is implemented and
  smoke-proven with synthetic traffic only
- C17F scoped synthetic route config loading and route-health reporting is
  implemented and smoke-proven with synthetic traffic only
- C17G Control Plane scoped synthetic config read/consume is implemented and
  test-proven with synthetic traffic only
- C17H deployed multi-agent synthetic config smoke is implemented and
  runtime-proven on `docker-test` with synthetic traffic only
- C17I production forwarding gate foundation is implemented and test-proven;
  production forwarding remains unavailable
- C17J production envelope contract validation is implemented and test-proven;
  production forwarding remains unavailable
- C17K production envelope observation is implemented and test-proven;
  production forwarding remains unavailable
- C17L bounded production observation sink is implemented and test-proven;
  production forwarding remains unavailable
- C17M production observation sink wiring is implemented and test-proven;
  production forwarding remains unavailable
- C17N production observation sink metrics are implemented and test-proven;
  production forwarding remains unavailable
- C17O production observation sink local metrics logging is implemented and
  test-proven; production forwarding remains unavailable
- C17P production observation sink change-driven metrics logging is implemented
  and test-proven; production forwarding remains unavailable
- C17Q production forwarding gate/runtime log boundary is implemented and
  test-proven; production forwarding remains unavailable
- C17R production observation sink capacity guard is implemented and
  test-proven; production forwarding remains unavailable
- C17S production observation panic fail-closed hardening is implemented and
  test-proven; production forwarding remains unavailable
- C17T production envelope payload boundary is implemented and test-proven;
  production forwarding remains unavailable
- C17U production envelope created-at skew boundary is implemented and
  test-proven; production forwarding remains unavailable
- C17V peer endpoint candidate model and NAT/connectivity hints are
  implemented and test-proven; production forwarding remains unavailable
- C17W peer endpoint candidate scoring model is implemented and test-proven;
  production forwarding remains unavailable
- C17X health-aware endpoint candidate scoring overlay is implemented and
  test-proven; production forwarding remains unavailable
- C17Y Platform Owner synthetic mesh visibility is implemented and
  build/test-proven; production forwarding remains unavailable
- C17Z production fabric-control direct forwarding is implemented and
  test-proven; production service traffic remains unavailable
- C17Z1 production fabric-control multi-hop route-path forwarding is
  implemented and test-proven; production service traffic remains unavailable
- C17Z2 production fabric-control forwarding observability is implemented and
  test-proven; production service traffic remains unavailable
- C17Z3 production fabric-control route-config boundary is implemented and
  test-proven; production service traffic remains unavailable
- C17Z4 scoped peer directory/recovery seed boundary is implemented and
  test/build-proven; production service traffic remains unavailable
- C17Z5 node-agent peer cache runtime boundary is implemented and test-proven;
  production service traffic remains unavailable
- C17Z6 dynamic endpoint reporting boundary is implemented and test-proven;
  production service traffic remains unavailable
- C17Z7 private/corporate endpoint candidate boundary is implemented and
  test-proven; production service traffic remains unavailable
- C17Z8 peer connection state machine boundary is implemented and test-proven;
  production service traffic remains unavailable
- C17Z9 peer recovery planner boundary is implemented and test-proven;
  production service traffic remains unavailable
- C17Z10 peer connection intent planner boundary is implemented and
  test-proven; production service traffic remains unavailable
- C17Z11 peer connection manager runtime boundary is implemented and
  test-proven; production service traffic remains unavailable
- C17Z12 rendezvous/relay control-plane contract is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z13 rendezvous lease telemetry is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z14 rendezvous lease refresh contract is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z15 backend relay replacement policy is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z16 route/path decision artifact is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z17 node-side route generation tracker is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z18 synthetic route-health effective path runtime is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z19 synthetic route-health feedback scoring is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z20 node-side route-health feedback refresh is implemented and
  docker-test-runtime-proven; production service traffic remains unavailable
- C17Z21 node installation/update control-plane is implemented and
  docker-test-runtime-proven for Docker nodes; production service traffic
  remains unavailable
- C17Z22 Windows host-agent install/update supervision is implemented and
  runtime-proven on the remote Windows node; production service traffic remains
  unavailable
- C17Z23 update observability is implemented in backend/admin UI: per-node
  updater status history is exposed and deployed on docker-test, so node-agent
  and host-agent update activity can be audited from node details
- C17Z24 combined updater reporting is implemented and docker-test-proven:
  Linux/Docker `rap-host-agent update-loop` now also polls/reports
  `rap-host-agent` status, release `0.1.4` is published for node-agent and
  host-agent artifacts, and docker-test nodes `test-1/2/3` auto-updated to
  node-agent `0.1.4` while reporting host-agent `0.1.4` no-op status.
- C17Z25 Windows updater repair visibility is implemented in admin UI: node
  details / Updates now shows a ready CMD repair command for existing Windows
  nodes using `http://vpn.cin.su:19191/api/v1`, `--replace`, and
  `--auto-update-current-version 0.0.0` so a stale updater wrapper can be
  recreated without a new join token.
- C17Z26 updater fleet visibility is implemented in admin UI: the node list now
  shows per-node updater status based on latest `rap-node-agent` and
  `rap-host-agent` reports, explicitly flagging missing host-agent reports,
  stale update reports, or update errors before opening node details.
- C17Z27 backend version-state projection is implemented and deployed on
  docker-test: node list responses now derive `version_state` from active
  `rap-node-agent` desired policy plus latest update report. Docker/Linux nodes
  on `0.1.4` show `current`; the remote Windows node still on `0.1.3` shows
  `outdated` while remaining heartbeat-healthy.
- C17Z28 Windows updater loop hardening is implemented and partially
  docker-test-proven via release `0.1.5`: Windows host-agent updater scripts now
  run combined `update-loop --max-runs 1`, and Windows `update-loop` also
  polls/applies `rap-host-agent` updates. Release `0.1.5` artifacts are
  published for Docker/Linux and Windows; docker-test nodes `test-1/2/3`
  updated to `rap-node-agent 0.1.5`. Existing remote Windows nodes with stale
  pre-0.1.5 updater wrapper still require one repair command from admin UI to
  replace their local wrapper, after which automatic polling should continue.
- Admin UI now marks missing host-agent updater reports as `repair updater` in
  the node list and explains in node details / Updates when to run the Windows
  repair command. The command uses the external control-plane endpoint and does
  not require a join token for already enrolled Windows nodes.
- Admin UI node details / Updates also provides a ready downloadable
  `rap-repair-updater-<node>.cmd` plus copy-command action for Windows repair,
  reducing operator copy/paste mistakes on remote Windows hosts.
- Windows repair command generation was hardened after the first remote repair:
  foreground `update-loop` now includes explicit `--node-id`, copies any staged
  `rap-host-agent.exe.next` over the main host-agent binary after the one-shot
  loop exits, deletes the staged file, and runs the updater scheduled task.
  The node list now distinguishes `host-agent staged` from generic stale/error.
- C17Z29 Windows persistent updater repair is implemented in `rap-host-agent`
  release `0.1.6`: `install-windows` accepts `--node-id` and writes that node
  id into the persistent Windows updater wrapper so Scheduled Task polling no
  longer depends on finding `identity.json` in the expected state directory.
  Docker-test nodes `test-1/2/3` updated to `0.1.6`; existing Windows and
  off-host Docker nodes still need their local updater wrappers to pick up the
  0.1.6 host-agent repair path.
- C17Z30 operator-configured public mesh endpoints are implemented and
  docker-test-deployed: desired `mesh-listener.advertise_endpoint` is now
  projected into peer endpoint candidates for other nodes and preferred over
  auto-discovered private heartbeat endpoints. `home-1`
  (`8ad04829-cd30-4290-913d-1ce5c7ef7bb3`) is configured with
  `listen_addr=0.0.0.0:19131`, `advertise_endpoint=http://94.141.118.222:19199`,
  `connectivity_mode=direct`, `nat_type=port_restricted`, `region=home`.
  `test-1` synthetic config now receives `home-1` peer endpoint
  `http://94.141.118.222:19199`; internal `192.168.200.85:19131` responds with
  HTTP 405 on GET, while external `94.141.118.222:19199` currently refuses TCP,
  so router/firewall forwarding still needs correction outside the platform.
- C17Z31 offsite bootstrap peer selection is implemented and docker-test
  deployed: operator-configured public/direct desired mesh-listener endpoints
  are kept in core-mesh bootstrap even after the default warm-peer target is
  reached. This fixes the case where remote Windows node
  `ifcm-rufms-s-mo1cr` received only `test-*` warm peers and no `home-1`.
  Its synthetic config now includes `home-1` endpoint
  `http://94.141.118.222:19199` and candidates ordered as operator public,
  heartbeat advertised public, then private LAN converted to relay-required for
  offsite. External TCP to `94.141.118.222:19199` still failed from Codex and
  docker-test checks while internal `192.168.200.85:19131` succeeds, so a real
  offsite `Test-NetConnection 94.141.118.222 -Port 19199` is the next network
  validation.
- C17Z32 native Ubuntu/Linux service install is implemented and docker-test
  deployed: backend exposes `/node-agents/linux-install-profile`, host-agent
  supports `install-linux`, installs `rap-node-agent` under
  `/opt/rap/<node>`, state under `/var/lib/rap/nodes/<node>`, config under
  `/etc/rap/<node>`, creates `rap-node-agent-<node>.service`, and creates a
  persistent `rap-host-agent-updater-<node>.service` for automatic node-agent
  and host-agent updates. Release `0.1.7` is published for `rap-node-agent`
  (`linux_binary`, `windows_service`) and `rap-host-agent`
  (`linux_binary`, `windows_binary`). Admin UI now has an `Ubuntu service`
  install profile and generates profile-based `install-linux` commands.
  A one-use token for `vps-ubuntu-1` is active until 2026-05-02T08:41:41Z:
  `rap_join_a23Xhz63YstshWUBAPGPz5fzQ8YpHDP05RXaaYa4DoA`; scope roles are
  `core-mesh` and `relay-node`, control-plane endpoint is
  `http://vpn.cin.su:19191/api/v1`, artifact endpoint is
  `http://vpn.cin.su:19191/downloads`.
- Admin UI and docs now cover the full Windows updater operational workflow:
  node details shows an `Updater health` summary, generated repair CMD prints
  scheduled-task and binary diagnostics before/after repair, applies staged
  host-agent binaries, restarts the updater task, and README documents first
  install, repair without join-token, system-task/user-task behavior, staged
  host-agent recovery, and reboot/autostart verification.
- Cluster Authority plus node enrollment bootstrap polling are docker-test
  lifecycle-smoke-proven; fresh install migration replay is fixed for
  `cluster_admin_summaries`
- C18 VPN/IP tunnel service target design is completed as documentation only
- C18A VPN/IP tunnel control-plane data model foundation is implemented and
  backend-test-proven
- C18B VPN/IP tunnel lease/fencing hardening is implemented and
  backend-test-proven
- C18C VPN/IP tunnel node-agent desired-state consumption/reporting is
  implemented and backend-test-proven
- Version Storage / Update Repository is documented as a future Fabric Core
  service for signed release manifests, OS/arch artifacts,
  stable/current/candidate channels, update-cache mirroring, node-agent
  update supervision, rollback, and explicit data-structure migration bundles.
  Runtime updater behavior is partially implemented for the current Docker and
  Windows node-agent/host-agent paths; broader staged rollout policy and
  service payload forwarding remain separate work.
- no next platform-core implementation step is automatically authorized after
  C17Z20; choose the next narrow staged prompt explicitly before continuing
- preserve the proven RDP lifecycle behavior
- keep the current backend gateway available as the active/fallback implementation path
- accepted VPN data-plane target: the phone/client connects only to an
  available entry node; the entry node uses the existing mesh/fabric route to a
  selected exit node/pool, and the exit node handles LAN/internet egress. Nodes
  behind NAT may participate when they can maintain outbound mesh/control
  sessions. Backend packet relay must remain a compatibility/fallback path, not
  the desired steady-state path.
- C18D VPN-over-fabric foundation is implemented and docker-test-started:
  VPN client profiles include `vpn_fabric_route` with entry pool, exit pool,
  selected entry/exit, preferred `fabric_mesh` data-plane, and
  `backend_relay` fallback. Node-agent `0.2.39` adds a dedicated production
  `vpn_packet` channel (`vpn.packet_batch`, 256 KiB batch limit), destination
  delivery hook, `vpnruntime.FabricPacketTransport`, and
  `vpn_fabric_packet_transport` heartbeat capability. `home-1` auto-updated to
  `0.2.39`; other nodes have automatic desired policy `0.2.39` and should move
  as their updater loops pick it up. Live Android VPN traffic still uses backend
  relay until entry-node client ingress is wired to the fabric transport.
- C18E VPN-over-fabric route contract is backend-deployed on docker-test as
  `rap-backend:test-vpn-fabric-route-0.2.41`: when a VPN client profile selects
  different entry and exit nodes, backend now ensures two active
  `mesh_route_intents` with service_class `vpn_packets` and allowed channel
  `vpn_packet`. The live HOME profile currently selects `usa-los-1` as entry
  and `home-1` as exit when `entry_node_id=b829ffde-...` is requested, and the
  synthetic config for both nodes includes the two `vpn_packet` routes. Existing
  fallback remains `backend_relay`; production forwarding gate is still disabled
  on old/live remote nodes until their runtime is explicitly updated/enabled.
- External/offsite updater gap found and fixed for version `0.2.40`: native
  `rap-node-agent` binaries for `linux_binary`, `linux_service`, and
  `windows_service` plus matching `rap-host-agent` binaries are copied under
  `/downloads` and registered in channel `dev-external`. Update plans for
  `usa-los-1` (`linux_binary`) and `ifcm-rufms-s-mo1cr` (`windows_service`) now
  return `action=update`, `target_version=0.2.40` instead of
  `no_matching_artifact`.
- C18F production-forwarding gate work is partially live: backend
  `rap-backend:test-vpn-fabric-route-0.2.42` signs node synthetic configs with
  `production_forwarding=true` / `control_plane_only=false` when the node's
  desired `mesh-listener` workload has `production_forwarding_enabled=true`.
  `home-1` and `usa-los-1` desired mesh-listener configs have this flag enabled.
  Node-agent `0.2.44` accepts signed production-forwarding mesh configs and
  host-agent `0.2.44` fixes Docker updater behavior so synthetic mesh runtime is
  not disabled on Docker updates. Runtime status: `usa-los-1` reports
  `mesh_production_forwarding=true`; `home-1` reports `0.2.44` and synthetic
  runtime enabled, but its listener report is still `disabled/listen_addr_empty`,
  so `home-1` is not yet a usable production fabric endpoint. Next action is to
  repair why `home-1` is not applying the signed mesh-listener config
  (`listen_addr=0.0.0.0:19131`) after Docker updater restart.
- C18G VPN-over-fabric runtime path is live-tested on docker-test. Backend is
  deployed as `rap-backend:test-vpn-fabric-route-0.2.43`; VPN route intents now
  allow both `vpn_packet` data and `fabric_control` health probes. Node-agent
  `0.2.47` fixes initial production VPN packet envelope hop addressing and
  reports the matching version. `home-1` and `usa-los-1` both report
  `0.2.47`, healthy, listener `0.0.0.0:19131`, and
  `mesh_production_forwarding=true`. Live route health is reachable in both
  directions (`usa-los-1 -> home-1` around 200 ms, `home-1 -> usa-los-1`
  around 200-415 ms). A direct live POST to
  `http://195.123.240.88:19131/api/v1/clusters/.../vpn-connections/.../tunnel/client/packets`
  returns `202 Accepted`, proving entry-node VPN packet ingress can forward
  over fabric to the home exit. The HOME VPN placement policy now has entry
  pool `[usa-los-1, home-1]` and exit `home-1`; client profile with preferred
  `usa-los-1` selects `usa-los-1 -> home-1`.
- C18H live VPN triage on 2026-05-04: `home-1` and `usa-los-1` report
  node-agent `0.2.48`, healthy heartbeats, active HOME VPN assignment on
  `home-1`, and `packet_forwarding=true` / `runtime_available=true`. Manual
  packet tests through the USA entry proved the path
  Android-style packet -> `usa-los-1` -> fabric -> `home-1` -> LAN/DNS ->
  fabric -> `usa-los-1` -> client can return ICMP and DNS replies. The remaining
  live symptom was the phone not sending fresh packets to the current entry
  after the backend relay queue was cleared. Android VPN app `0.2.59` was built
  and published to `/downloads/rap-android-rdp-vpn-latest-debug.apk`; it
  normalizes old saved backend URLs (`vpn.cin.su:19191`,
  `94.141.118.222:19191`, `192.168.200.61:18080`, etc.) to the current USA
  entry backend `http://195.123.240.88:19131/api/v1` and shows app version,
  device id, and connection id in the header for live log correlation.
- C18I fabric service-channel foundation is live on 2026-05-07. Backend,
  node-agent, and Android VPN release `0.2.159` are published. VPN profiles now
  include a signed `rap.fabric_service_channel_lease.v1` with
  `entry_direct_http_v1` packet and WebSocket templates. Android consumes this
  lease and sends service-channel headers. The `usa-los-1` entry endpoint
  validates the cluster-authority signed lease payload and token hash; a live
  smoke through `http://195.123.240.88:19131/.../fabric/service-channels/...`
  succeeded with a valid lease and rejected a bad token with `403`. Current HOME
  profile selects `usa-los-1` as entry and `home-1` as exit; both nodes report
  `0.2.159`. Docker-test nodes `test-1`, `test-2`, and `test-3` also report
  `0.2.159`. `ifcm-rufms-s-mo1cr` is still on `0.2.119`; it has staged the
  host-agent `0.2.159` update and should finish on the next Windows updater
  loop/restart.
- C18J fabric service-channel runtime route-manager slice is live on
  2026-05-07 as node/host-agent `0.2.162`. The entry-node
  `FabricClientPacketIngress` now preserves its runtime object across synthetic
  config refreshes, so heartbeat telemetry reports the same ingress object that
  serves HTTP/WebSocket service-channel traffic. It tracks send/receive batches,
  route attempts/failures, selected route/next hop, local-gateway fallback, and
  inbox queue depths. `SendClientPacketBatch` now retries all valid
  `vpn_packet` route candidates with sticky preference before backend relay is
  allowed as degraded compatibility fallback. Release `0.2.161` was superseded
  because its Docker tar was rebuilt after registration; `0.2.162` is the
  clean published release with matching artifact hashes. Docker-test
  `test-1/2/3`, `usa-los-1`, and `ifcm-rufms-s-mo1cr` report `0.2.162`;
  `home-1` is healthy and still on `0.2.161` awaiting its next updater loop.
  Live smoke through `http://195.123.240.88:19131/.../fabric/service-channels`
  returned `202` and `usa-los-1` telemetry then showed route attempts,
  one route failure, and selected next hop `home-1`, proving live ingress
  telemetry and alternate-route retry are active.
- C18K service-neutral flow/channel scheduler is live on 2026-05-07 as
  node/host-agent `0.2.163`. The VPN proving service still carries universal
  IP packets and does not route by application protocol, but the entry runtime
  now hashes packets by IP 5-tuple, or packet hash for non-IP/invalid packets,
  into 32 logical `flow-*` channels. Each channel has bounded queue accounting,
  high-watermark/backpressure/dropped telemetry, and batches are fanned out per
  logical channel before being sent through the same fabric route-manager. Live
  smoke against `usa-los-1` posted two different IP flows through the signed
  service-channel endpoint and heartbeat reported `send_packets=2`,
  `send_flow_batches=2`, `flow_scheduler.channel_count=2`, `enqueued=2`,
  `dequeued=2`, `dropped=0`, with queue depths for `flow-12` and `flow-14`.
  All six current cluster nodes (`home-1`, `usa-los-1`, `ifcm-rufms-s-mo1cr`,
  `test-1`, `test-2`, `test-3`) report node-agent `0.2.163` and healthy.
- C18L active flow scheduling telemetry is live on 2026-05-07 as
  node/host-agent `0.2.164`. Each `flow-*` channel now keeps route memory,
  served count, last served time, last route/next hop, failed-route marker,
  consecutive failures, stall count, last send duration, and explicit
  `route_rebuild_recommended` / `degraded_fallback_recommended` signals. The
  scheduler drains non-stalled channels first, prefers less-served/older
  channels, avoids a channel's last failed route on the next send, and only
  marks degraded fallback after repeated failures. Live smoke against
  `usa-los-1` posted two IP flows through the signed service-channel endpoint:
  heartbeat reported schema `c18l.fabric_service_channel_runtime_report.v1`,
  `send_packets=2`, `send_flow_batches=2`, `flow_scheduler.channel_count=2`,
  `dropped=0`, `backpressure=false`, `last_next_hop=home-1`, and per-flow
  `served=1`. One stale candidate route failed and was bypassed before the
  successful route to `home-1`. All six current cluster nodes (`home-1`,
  `usa-los-1`, `ifcm-rufms-s-mo1cr`, `test-1`, `test-2`, `test-3`) report
  node-agent `0.2.164` and healthy.
- C18M Control Plane service-channel feedback is live on 2026-05-07. Backend
  image `rap-backend:fabric-service-channel-0.2.165` is deployed on
  docker-test, and node/host-agent `0.2.165` artifacts are published. When
  issuing `rap.fabric_service_channel_lease.v1`, backend now reads fresh
  entry-node heartbeat metadata
  `fabric_service_channel_runtime_report.ingress.flow_scheduler.channel_stats`,
  builds per-route service-channel feedback, boosts recently successful routes,
  penalizes recent failures, and fences routes that report
  `route_rebuild_recommended`, `degraded_fallback_recommended`, or repeated
  consecutive failures. Fenced routes are not selected as primary or alternate;
  if all selected entry/exit routes are fenced, the lease uses explicit
  degraded backend fallback with reason
  `fabric_routes_fenced_by_service_channel_feedback`. Live smoke created two
  short-lived `test-1 -> test-2` route intents, injected a fresh
  service-channel flow feedback heartbeat marking the higher-priority route as
  rebuild-required, and the next lease selected the lower-priority healthy
  route with score reason `service_channel_recent_success`; the bad route was
  not offered as an alternate. Current node rollout: `home-1`, `usa-los-1`,
  `test-1`, `test-2`, and `test-3` report `0.2.165`; Windows `ifcm-rufms-s-mo1cr`
  remains healthy on `0.2.164` and should move on its next updater cycle.
- C18N durable service-channel route feedback is live on 2026-05-07. Backend
  image `rap-backend:fabric-service-channel-0.2.166` is deployed on
  docker-test with migration `000025_fabric_service_channel_route_feedback`.
  Heartbeats now persist service-neutral route observations into
  `fabric_service_channel_route_feedback_observations` and maintain an
  expiring latest view in `fabric_service_channel_route_feedback_latest`.
  Lease selection reads this durable latest feedback before falling back to
  in-memory heartbeat parsing, so route fencing survives backend restarts and
  stale heartbeat replacement. Node/host-agent `0.2.166` artifacts and Docker
  image are published, update policies target `0.2.166`, and `test-1/2/3`,
  `usa-los-1`, and `ifcm-rufms-s-mo1cr` report `0.2.166`; `home-1` is healthy
  but still on `0.2.165` until its next updater cycle. Live smoke created two
  short-lived `test-1 -> test-2` routes, persisted a fenced observation for the
  higher-priority bad route and a healthy observation for the lower-priority
  route, restarted backend, and the next lease selected the healthy route with
  `service_channel_recent_success`.
- C18O service-channel feedback diagnostics and synthetic route avoidance are
  live on 2026-05-07. Backend image
  `rap-backend:fabric-service-channel-0.2.167` is deployed on docker-test and
  web-admin is rebuilt/published. Admin/API now expose fresh durable feedback
  through `GET /clusters/{clusterID}/fabric/service-channels/route-feedback`,
  and each node synthetic config includes
  `service_channel_route_feedback` with healthy/degraded/fenced counts and
  observations. Synthetic config generation skips routes fenced by the local
  node's durable service-channel feedback, so nodes stop receiving known-bad
  route configs while the feedback is active. Live smoke created fresh
  `test-1 -> test-2` routes, persisted `fenced` feedback for the higher-priority
  route and `healthy` feedback for the lower-priority route, confirmed the API
  returned both observations, and confirmed `test-1` synthetic config excluded
  the bad route while keeping the healthy route.
- C18P proactive service-channel replacement decisions are live on 2026-05-07.
  Backend image `rap-backend:fabric-service-channel-0.2.168` is deployed on
  docker-test and web-admin is rebuilt/published. When synthetic config
  generation withholds a route fenced by local service-channel feedback, it now
  records a `route_path_decisions` item with
  `decision_source=service_channel_feedback_replacement`,
  `replacement_route_id`, effective replacement hops, and score reasons. If no
  alternate exists, the decision source becomes
  `service_channel_feedback_no_alternate` with visible score reason
  `no_unfenced_alternate_route`. Live smoke created fresh `test-1 -> test-2`
  bad/good routes, fenced the bad route, disabled older smoke routes, and
  confirmed `test-1` synthetic config excluded the bad route, kept the good
  route, and reported replacement from bad route to good route.
- C18Q service-channel replacement dampening is live on 2026-05-07. Backend
  image `rap-backend:fabric-service-channel-0.2.169`, node/host-agent
  `0.2.169` artifacts, Docker image, update policies, and web-admin are
  published on docker-test. Replacement selection now gives a large stable
  preference to routes with active healthy durable feedback, adding
  `active_healthy_feedback_dampening_window` to score reasons, so a recently
  successful replacement wins over a higher-priority but unproven route until
  the feedback window expires or a newer fenced/healthy observation changes the
  state. `RoutePathDecisionReport` now includes `degraded_decision_count` for
  `service_channel_feedback_no_alternate`, and node-agent heartbeat reports
  include `replacement_route_id` and degraded counts after upgrade. Live smoke
  fenced a high-priority bad `test-1 -> test-2` route, supplied healthy feedback
  for a low-priority route, also created a higher-priority unproven route, and
  confirmed replacement selected the healthy route because of the dampening
  window.
- C18Q hotfix `0.2.171` is published on 2026-05-07. Node-agent now includes
  `service_channel_route_feedback` in the signed synthetic config model before
  recalculating the authority payload hash. Without this, upgraded backend
  configs were signed correctly but `0.2.169` agents rejected them with
  `control-plane synthetic mesh config authority payload hash mismatch`.
  Regression coverage verifies a signed config containing durable
  service-channel feedback. Artifacts, Docker image, latest download aliases,
  and update policies were moved to `0.2.171`; `test-1/2/3` are running
  `0.2.171` and loading `source=control_plane` again. The release includes
  `linux_service`, Docker, Windows service, and binary artifacts so service
  installs can auto-update. Old C18 smoke/expired route intents were disabled
  after validation.
- C18R fleet diagnostics/operator action slice is live on 2026-05-07. Backend
  image `rap-backend:fabric-service-channel-0.2.172` adds route feedback
  filters (`route_id`, `feedback_status`, `include_expired`) and
  `POST /clusters/{clusterID}/fabric/service-channels/route-feedback/expire`.
  The expire action is cluster-mutable/admin gated and marks latest feedback
  expired without deleting historical observations. Web-admin / Fabric Links
  now shows a cluster-level service-channel feedback panel with fenced,
  degraded, healthy and no-alternate counts, replacement/no-alternate decisions,
  and an operator `expire` action for stale non-healthy feedback.
- C18S service-channel feedback churn guardrails are implemented on
  2026-05-07. Operator expire now records
  `fabric.service_channel_route_feedback.expired` audit events, returns and
  persists a short `operator_retry_cooldown_until`, and route generation adds
  `service_channel_route_retry_after_operator_expire` when a manually expired
  route is being retried. During that cooldown, repeated non-healthy feedback
  from the same reporter/route/service is suppressed as
  `operator_retry_cooldown` instead of immediately fencing the route again.
  Web-admin shows the retry/cooldown state in Fabric Links.
- C18T automatic rebuild decision contract is implemented on 2026-05-07.
  `RoutePathDecision` now carries `rebuild_request_id`, `rebuild_status`,
  `rebuild_reason`, and `rebuild_attempt`. When fenced service-channel feedback
  keeps failing outside manual retry cooldown, Control Plane records a bounded
  rebuild request. If an unfenced alternate exists, the decision is marked
  `rebuild_status=applied`; if not, it is
  `pending_degraded_fallback` and leases expose backend relay with reason
  `fabric_route_rebuild_pending_backend_relay`. Web-admin shows rebuild counts,
  status, and attempts in Fabric Links. A live smoke on docker-test created
  short-lived `test-1 -> test-2` bad/good routes, reported fenced feedback for
  the bad route and healthy feedback for the good route, and confirmed scoped
  synthetic config returned `service_channel_feedback_replacement` with
  `rebuild_status=applied` and `rebuild_attempt=3`. Node/host-agent `0.2.175`
  is published so agents preserve the new signed rebuild fields.
- C18U node-agent route-manager rebuild consumption is live on 2026-05-07.
  Node-agent `0.2.176` now converts backend rebuild decisions into a
  service-channel route-manager snapshot, counts rebuild requests/applies,
  marks applied/pending-degraded routes as withdrawn, clears a withdrawn cached
  selected route, and excludes withdrawn routes from new service-channel route
  candidates. This keeps new flows from retrying a route that Control Plane has
  already rebuilt away from. Unit coverage verifies a bad route is skipped in
  favor of its replacement. Node/host-agent `0.2.176` artifacts, Docker image,
  latest download aliases, release manifests, and node policies are published.
  `test-1/2/3`, `usa-los-1`, and `ifcm-rufms-s-mo1cr` report `0.2.176`.
  Backend `rap-backend:fabric-service-channel-0.2.176` is deployed with a
  panel consistency fix: if a node reports the target version, stale failed
  update status no longer overrides `version_state=current`.
- C18V route-manager churn telemetry is live on 2026-05-07. Node-agent
  `0.2.177` adds `route_manager_transition` to the service-channel runtime
  report with previous/current generation, transition status, decision counts,
  withdrawn/restored route counts, pending-degraded fallback count, rebuild
  applied count, and any cleared cached route. Tests cover applied rebuild
  replacement, pending degraded fallback with no alternate, and restoration by
  a fresh config so withdrawn routes do not become sticky local state. Artifacts,
  Docker image, latest download aliases, release manifests, and node policies
  are published. `test-1/2/3` run `0.2.177`; their heartbeat metadata exposes
  `rap.fabric_service_channel_route_manager_transition.v1`.
- C18W live Control Plane/runtime verification is implemented and smoke-passed
  on 2026-05-07. Script
  `scripts/fabric/c18w-service-channel-route-manager-smoke.ps1` drives the
  whole loop against docker-test API: creates temporary service-channel route
  intents for `test-1 -> test-2`, injects fenced/healthy route feedback through
  heartbeat, verifies scoped config emits `rebuild_status=applied`, waits for
  node-agent heartbeat `route_manager_transition.status=applied_rebuild`,
  expires the feedback, verifies the restored config has no rebuild decision,
  and waits for `restored_by_new_config`. Result artifact:
  `artifacts/c18w-service-channel-route-manager-smoke-result.json` with run
  `c18w-20260507-173226`. During the smoke, operator expire exposed live pgx
  parameter issues; backend `rap-backend:fabric-service-channel-0.2.179` is
  deployed with safer UUID/text timestamp handling for feedback expire.
- C18X logical-channel isolation and bounded backpressure coverage is
  implemented and smoke-passed on 2026-05-07. Node-agent/host-agent `0.2.180`
  artifacts, Docker image, latest download aliases, release manifests, and
  node policies are published. The key runtime fix is in
  `FabricClientPacketIngress.routeCandidatesForChannel`: a channel with a local
  failed-route avoid state no longer falls back to the global last selected
  route, so one degraded logical flow cannot drag unrelated flows back onto the
  failed path. Coverage proves independent logical-channel failover, bounded
  same-channel backpressure/drop telemetry, and packet-flow hashing. Script
  `scripts/fabric/c18x-service-channel-logical-channel-smoke.ps1` passes with
  result artifact `artifacts/c18x-service-channel-logical-channel-smoke-result.json`
  run `c18x-20260507-180647`. Test docker nodes `test-1/2/3` are running
  `rap-node-agent:0.2.180`; backend remains
  `rap-backend:fabric-service-channel-0.2.179`.
- C18Y route-intent lifecycle cleanup is implemented and smoke-passed on
  2026-05-07. Backend `rap-backend:fabric-service-channel-0.2.181` is deployed
  on docker-test, and web-admin Fabric Links now shows route-intent lifecycle
  counts/table with operator `expire` and `disable` actions. Route intents are
  enriched with `lifecycle_status`, `is_expired`, and `policy_expires_at`.
  Node-scoped synthetic mesh config now filters out expired policy routes, so
  stale smoke routes no longer get emitted to agents for route-health probing.
  API actions are available at
  `POST /clusters/{clusterID}/mesh/route-intents/{routeIntentID}/expire` and
  `/disable`. Script `scripts/fabric/c18y-route-intent-lifecycle-smoke.ps1`
  passed against docker-test API, result
  `artifacts/c18y-route-intent-lifecycle-smoke-result.json` run
  `c18y-20260507-192702`. During deploy, docker-test root disk was full from
  build cache/images; `docker builder prune -af` and `docker image prune -f`
  freed space before redeploy.
- C18Z bounded service-channel load coverage is implemented, published, and
  smoke-passed on 2026-05-07. Node-agent/host-agent `0.2.181` artifacts,
  Docker image `rap-node-agent:0.2.181`, latest download aliases, release
  manifests, and update policies are published. `test-1/2/3` are restarted on
  `rap-node-agent:0.2.181`; `usa-los-1` also reports `0.2.181`. The key runtime
  fix is in `FabricFlowScheduler.Snapshot`: backpressure remains visible when
  bounded drops occurred, even after the queue drains. Coverage proves
  multi-channel rebuild away from a withdrawn primary route and per-channel
  bounded drop/high-water telemetry. Script
  `scripts/fabric/c18z-service-channel-load-smoke.ps1` passed against
  docker-test API, result
  `artifacts/c18z-service-channel-load-smoke-result.json` run
  `c18z-20260507-194616`. Release artifacts were corrected after initial
  publication to use backend-relative `/downloads/...` primary URLs plus
  internal/external mirror URLs, so offsite nodes resolve downloads through
  their own control-plane origin such as `http://vpn.cin.su:19191`. Current
  caveat: `ifcm-rufms-s-mo1cr` and `home-1` remained `version_state=failed`
  at the last check; their next update plan now points to reachable `0.2.181`
  artifacts, but the local updater loop still needs to retry/report success.
- C18Z1 live service-channel ingress is implemented, published, and
  smoke-passed on 2026-05-07. Node-agent/host-agent `0.2.182` artifacts,
  Docker image `rap-node-agent:0.2.182`, release manifests, and update
  policies are published. Backend `rap-backend:fabric-service-channel-0.2.182`
  is deployed on docker-test. The runtime fix is a dynamic mesh listener
  handler: synthetic config refreshes now update `/mesh/v1/forward`,
  service-channel ingress, production routes, delivery inbox, and forward
  transport without requiring a port/listener restart. Backend route-feedback
  latest policy now prevents a fresh healthy heartbeat from immediately
  overwriting active degraded/fenced feedback before TTL expiry, so rebuild
  decisions survive long enough for nodes to apply them. Script
  `scripts/fabric/c18z1-live-service-channel-ingress-smoke.ps1` posts signed
  generic packet batches to the running `test-1` service-channel HTTP endpoint,
  waits both entry and exit runtime configs, verifies exit inbox delivery,
  injects route feedback, observes Control Plane rebuild, waits node
  `applied_rebuild`, sends a second batch over the replacement route, and
  expires both temporary route intents. Result:
  `artifacts/c18z1-live-service-channel-ingress-smoke-result.json` run
  `c18z1-20260507-203628`. All current nodes report `0.2.182/current` at the
  last check.
- C18Z2 live service-channel sustained soak/failure smoke is implemented and
  passed on 2026-05-07 without a new runtime release. Script
  `scripts/fabric/c18z2-live-service-channel-soak-smoke.ps1` drives signed
  generic packet batches through the running `test-1` service-channel HTTP
  endpoint, keeps temporary primary/alternate `test-1 -> test-2` route intents
  visible, restarts the exit-node container `rap_test_node_test_2`, waits for
  the exit runtime to reload synthetic config, and verifies recovery batches
  reach the exit fabric inbox after the restart. Result:
  `artifacts/c18z2-live-service-channel-soak-smoke-result.json` run
  `c18z2-20260507-205112`: warm batches `6/6`, during-restart batches `3/3`,
  recovery batches `8/8`, exit inbox depth grew from post-restart baseline
  `0` to `88`, drops `0`, and both temporary route intents expired.
- C18Z3 live service-channel entry/WebSocket/degraded-fallback smoke is
  implemented, published, and passed on 2026-05-07. Node-agent/host-agent
  `0.2.183` artifacts and Docker image `rap-node-agent:0.2.183` are published
  to docker-test downloads; update policies for `test-1/2/3` are set to
  `rolling` target `0.2.183`, and the test containers run that image. The
  runtime fix makes the entry node honor the signed service-channel lease
  authority: leases with `status=degraded_fallback` or
  `primary_route.status=missing_route_intent` now force backend fallback instead
  of reusing stale generic route candidates. The same fallback rule is applied
  to HTTP and WebSocket packet ingress. Script
  `scripts/fabric/c18z3-live-service-channel-entry-ws-fallback-smoke.ps1`
  verifies signed HTTP warm batches, WebSocket ingress parity, entry-node
  container restart while the lease exists, recovery batches over the same
  lease, explicit degraded fallback for a no-route exit, and route-intent
  expiry. Result:
  `artifacts/c18z3-live-service-channel-entry-ws-fallback-smoke-result.json`
  run `c18z3-20260507-211402`: warm `4/4`, WebSocket packets `8`, recovery
  `4/4`, backend fallback queue `0 -> 8`, route failures `0`, and all checks
  passed. During publication the first `0.2.183` Docker tar had a malformed
  entrypoint and stale size/hash metadata; it was rebuilt, the latest tar alias
  was replaced, and the release artifact row was corrected to sha256
  `231286cf5860b22cf8ca6550f67f61b0ca4b5011ab9b09995bcabbafe883fee1`, size
  `7261696`.
- C18Z4 live service-channel long-session pressure smoke is implemented and
  passed on 2026-05-07 without a new runtime release beyond `0.2.183`. Script
  `scripts/fabric/c18z4-live-service-channel-session-pressure-smoke.ps1` opens
  one signed long-lived service-channel WebSocket from `test-1` to `test-2`,
  sends 48 packet batches / 384 packets, expires the primary route intent while
  the WebSocket session is still active, waits for dynamic synthetic-config
  refresh, and verifies the remaining packets use the alternate route. Result:
  `artifacts/c18z4-live-service-channel-session-pressure-smoke-result.json`
  run `c18z4-20260507-212748`: exit inbox depth `0 -> 384`, route failure delta
  `0`, flow drop delta `0`, backend fallback queue `0 -> 0`, primary route
  removed from entry/exit configs, alternate route selected after the switch,
  and both route intents expired. This proves the shared Fabric Service Channel
  can keep a service session alive while Control Plane changes the live route
  set, without falling back to backend relay.
- C18Z5 live service-channel exit-restart smoke is implemented and passed on
  2026-05-07 without a new runtime release beyond `0.2.183`. Script
  `scripts/fabric/c18z5-live-service-channel-exit-restart-smoke.ps1` keeps one
  signed WebSocket service-channel session open from `test-1` to `test-2`,
  sends pre-outage traffic, stops `test-2` for a bounded outage while traffic
  continues, starts it again, waits runtime readiness, then sends recovery
  traffic over the same WebSocket. Result:
  `artifacts/c18z5-live-service-channel-exit-restart-smoke-result.json` run
  `c18z5-20260507-213745`: pre/outage/recovery batches `12/24/24`, total
  packets `480`, route failure delta `48`, backend fallback queue `0 -> 192`,
  flow drop delta `0`, and recovery exit inbox `0 -> 192`. This proves real
  exit-node failure is visible as fallback/failure telemetry while the
  long-lived service channel remains usable and fabric delivery resumes after
  the exit runtime returns. After the test, `test-2` and all active cluster
  nodes were healthy/current on `0.2.183`.
- C18Z6 live service-channel active rebuild smoke is implemented and passed on
  2026-05-07 without a new runtime release beyond `0.2.183`. Script
  `scripts/fabric/c18z6-live-service-channel-active-rebuild-smoke.ps1` keeps a
  signed WebSocket service-channel session open from `test-1` to `test-2`,
  sends pre-rebuild traffic, injects route-health feedback that marks the
  primary route stale and names the alternate route as replacement, waits for
  Control Plane `rebuild_status=applied`, waits for node-agent
  `route_manager_transition.status=applied_rebuild`, then continues sending
  over the same WebSocket. Result:
  `artifacts/c18z6-live-service-channel-active-rebuild-smoke-result.json` run
  `c18z6-20260507-214900`: pre/post batches `16/32`, total packets `384`,
  exit inbox depth `0 -> 384`, Control Plane replacement route
  `b2f3c510-46d2-4dce-8389-3952a99d0311`, route failure delta `0`, flow drop
  delta `0`, backend fallback queue `0 -> 0`, all checks passed, and all
  active nodes remained healthy/current on `0.2.183`. This proves a live
  service channel can apply a route-manager rebuild decision without rebuilding
  the service WebSocket.
- C18Z7 live service-channel concurrent isolation smoke is implemented and
  passed on 2026-05-07 without a new runtime release beyond `0.2.183`. Script
  `scripts/fabric/c18z7-live-service-channel-concurrent-isolation-smoke.ps1`
  opens three signed WebSocket service-channel sessions over the same
  `test-1 -> test-2` entry/exit pair, interleaves packet batches across all
  sessions, injects primary-route stale feedback, waits for Control Plane
  `rebuild_status=applied` and node-agent `applied_rebuild`, then continues all
  sessions over the same sockets. Result:
  `artifacts/c18z7-live-service-channel-concurrent-isolation-smoke-result.json`
  run `c18z7-20260507-215727`: 3 sessions, 36 rounds, 288 packets per session,
  864 packets total, each session exit inbox depth `288`, total exit depth
  `864`, backend fallback delta `0`, route failure delta `0`, flow drop delta
  `0`, and all active nodes healthy/current on `0.2.183`. This proves rebuild
  and route-manager state are shared correctly without one active service
  session starving or poisoning the other concurrent sessions.
- C18Z8 live service-channel backpressure isolation smoke is implemented and
  passed on 2026-05-07 without a new runtime release beyond `0.2.183`. Script
  `scripts/fabric/c18z8-live-service-channel-backpressure-isolation-smoke.ps1`
  opens two interactive signed WebSocket sessions plus one abusive session over
  the same `test-1 -> test-2` entry/exit pair. The abusive session sends 1300
  packets on one stable 5-tuple to force a single flow shard to hit bounded
  queue pressure while the interactive sessions continue sending small batches.
  Result:
  `artifacts/c18z8-live-service-channel-backpressure-isolation-smoke-result.json`
  run `c18z8-20260507-221347`: both interactive sessions delivered 192 packets
  each, the abusive flow reached scheduler high watermark `1024`, scheduled
  `1030` packets on the hottest channel, dropped `282` packets on that channel,
  produced backend fallback delta `0`, route failure delta `0`, and all active
  nodes stayed healthy/current on `0.2.183`. This proves bounded backpressure is
  visible and isolated to the overloaded logical flow without starving other
  active service sessions.
- C18Z9 route-pool runtime selection is implemented, released as node/host
  agent `0.2.184`, published to docker-test downloads, and passed on
  2026-05-07. Runtime fix: when Control Plane marks a service-channel route
  `rebuild_status=applied` and provides `replacement_route_id`, node-agent now
  treats that replacement as the preferred route for sticky flow/channel
  selection instead of merely withdrawing the bad route and falling back to
  config order. Unit coverage:
  `TestFabricClientPacketIngressPrefersControlPlaneReplacementOverConfigOrder`.
  Live script
  `scripts/fabric/c18z9-live-service-channel-route-pool-smoke.ps1` creates a
  route pool with slow relay primary `test-1 -> test-3 -> test-2` and fast
  direct replacement `test-1 -> test-2`, keeps one signed WebSocket active,
  injects stale-route feedback, waits for Control Plane and node-agent
  `applied_rebuild`, then verifies the same service session continues over the
  direct replacement. Result:
  `artifacts/c18z9-live-service-channel-route-pool-smoke-result.json` run
  `c18z9-20260507-224901`: 54 batches / 432 packets sent and delivered to exit,
  backend fallback delta `0`, route failure delta `0`, flow drop delta `0`, and
  temporary route intents expired. Test containers `test-1/2/3` run
  `rap-node-agent:0.2.184`; `usa-los-1`, `home-1`, and
  `ifcm-rufms-s-mo1cr` remain healthy on `0.2.183` until their rollout policy is
  advanced.
- C18Z10 service-channel exit-pool failover is implemented, released as
  node/host-agent `0.2.185`, published to docker-test downloads, registered in
  the stable update channel, and passed on 2026-05-07. Backend service-channel
  leases now bind signed entry/exit pools, selected exit follows the selected
  primary route, and Control Plane replacement can cross to another authorized
  exit when route intents share an exit-pool/resource metadata key. Node-agent
  now honors the signed lease primary route as the initial service-channel
  preference before normal config-order selection. Unit coverage:
  `TestIssueFabricServiceChannelLeaseSelectsHealthyAlternateExitFromPool`,
  `TestGetNodeSyntheticMeshConfigReplacesFencedServiceChannelRouteAcrossExitPool`,
  and `TestFabricClientPacketIngressUsesLeasePreferredRouteBeforeConfigOrder`.
  Live script
  `scripts/fabric/c18z10-live-service-channel-exit-pool-smoke.ps1` creates a
  primary exit route `test-1 -> test-2` and an alternate exit route
  `test-1 -> test-3` in the same exit pool, keeps one signed WebSocket active,
  verifies pre-rebuild traffic reaches the primary exit, injects stale-route
  feedback, waits for Control Plane/node-agent `applied_rebuild`, then verifies
  post-rebuild traffic reaches the alternate exit. Result:
  `artifacts/c18z10-live-service-channel-exit-pool-smoke-result.json` run
  `c18z10-20260507-232645`: 54 batches / 432 packets sent, primary exit queue
  `144`, alternate exit queue `288`, backend fallback `0`, route failure delta
  `0`, flow drop delta `0`, decision source
  `service_channel_feedback_exit_pool_replacement`, and temporary route intents
  expired. Backend and `test-1/2/3` are running `0.2.185`; update plans now
  return download URLs on `192.168.200.61:18080` when the API is reached
  directly on `18121`.
- C18Z11 service-channel entry-pool failover contract is implemented and
  backend-deployed as `rap-backend:fabric-service-channel-0.2.186`; node-agent
  remains `0.2.185` because no node runtime binary change was required.
  Backend lease selection now keeps `selected_entry_node_id` aligned with the
  selected primary route when the healthy route starts at another authorized
  entry node. Route replacement scope also understands entry-pool metadata
  keys (`entry_pool_id`, `service_entry_pool_id`, `fabric_entry_pool_id`) in
  addition to exit-pool/resource keys, and route decision reports count
  entry-pool replacement decisions. Unit coverage:
  `TestIssueFabricServiceChannelLeaseSelectsHealthyAlternateEntryFromPool` and
  `TestGetNodeSyntheticMeshConfigReplacesFencedServiceChannelRouteAcrossEntryPool`.
  Live script
  `scripts/fabric/c18z11-live-service-channel-entry-pool-smoke.ps1` creates
  primary entry route `test-1 -> test-2` and alternate entry route
  `test-3 -> test-2`, verifies the initial lease uses `test-1`, sends 144
  packets, injects service-channel feedback fencing the primary entry route,
  verifies a refreshed lease selects `test-3`, then sends 288 more packets
  through the alternate entry to the same exit. Result:
  `artifacts/c18z11-live-service-channel-entry-pool-smoke-result.json` run
  `c18z11-20260507-235341`: exit queue `432`, backend fallback `0`, route
  failure deltas `0/0`, flow drop deltas `0/0`, and temporary route intents
  expired. This is a lease refresh/reconnect contract for entry replacement;
  preserving a broken client-to-entry socket across an entry node outage is not
  expected.
- C18Z12 service-channel route quality scoring is implemented and
  backend-deployed as `rap-backend:fabric-service-channel-0.2.187`; node-agent
  remains `0.2.185`. Backend now uses service-neutral runtime quality feedback
  from `fabric_service_channel_runtime_report.ingress.flow_scheduler` when
  scoring lease routes: `last_send_duration_ms` adds deterministic latency
  boosts/penalties, and recent failures/stalls apply bounded penalties. This is
  protocol-agnostic and applies to the shared fabric channel, not HTTP/RDP/DNS
  special cases. Unit coverage:
  `TestIssueFabricServiceChannelLeasePrefersFastHealthyRouteFeedback`. Live
  script `scripts/fabric/c18z12-service-channel-route-quality-smoke.ps1`
  creates a high-priority slow relay route `test-1 -> test-3 -> test-2` and a
  lower-priority fast direct route `test-1 -> test-2`; the initial lease
  selects the slow route by policy priority, then quality telemetry reports
  fast route `8ms` and slow route `900ms`, and the refreshed lease selects the
  fast route with score reason `service_channel_quality_latency_le_10ms`.
  Result: `artifacts/c18z12-service-channel-route-quality-smoke-result.json`
  run `c18z12-20260508-000209`; all checks passed and temporary route intents
  expired.
- C18Z13 live service-channel route quality self-learning is implemented,
  released as node-agent `0.2.188`, published to docker-test downloads,
  registered in the stable update channel, and deployed to docker-test
  containers `test-1/2/3`. Runtime fix: positive sub-millisecond
  service-channel send durations are rounded to `1ms`, preventing fast local
  routes from looking like "no quality sample". Unit coverage:
  `TestFabricFlowSchedulerRoundsSubMillisecondSendDuration`. Live script
  `scripts/fabric/c18z13-live-service-channel-route-quality-smoke.ps1` proves
  the self-learning path without heartbeat injection: initial lease picks a
  higher-priority relay route, real service-channel traffic sends 24 batches /
  192 packets over the fast direct route, backend persists healthy route
  feedback from the node-agent heartbeat (`last_send_duration_ms=1`,
  `score_adjustment=90`), and a refreshed lease prefers that fast route over a
  newly introduced higher-priority relay candidate. Result:
  `artifacts/c18z13-live-service-channel-route-quality-smoke-result.json` run
  `c18z13-20260508-001610`; backend fallback `0`, flow drops `0`, temporary
  route intents expired. Published release id:
  `64effc62-18b6-4eeb-a1c9-f5fb8e251491`.
- C18Z14 active-session route-quality preference is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.190` and node-agent `0.2.189` are
  deployed to docker-test `test-1/2/3`; node-agent `0.2.189` is published to
  docker-test downloads and registered in the stable update channel as release
  `9bda9bac-71f3-4e8f-ae70-2abccb1cb866`. Backend now decays older healthy
  service-channel feedback before lease scoring so stale success loses weight
  before expiry. Node-agent consumes healthy route-quality observations from
  signed synthetic config and can override sticky per-flow/config-order route
  choice when a learned route is significantly better. Unit coverage:
  `TestFabricClientPacketIngressQualityPreferenceOverridesStickyRoute` and
  `TestIssueFabricServiceChannelLeaseDecaysOlderHealthyRouteFeedback`. Live
  script
  `scripts/fabric/c18z14-live-service-channel-active-quality-shift-smoke.ps1`
  keeps one signed WebSocket open while route policy changes: it starts on a
  higher-priority relay route, expires that route, sends real traffic through
  the fast direct route to teach feedback, introduces a new higher-priority
  relay candidate, and verifies the same active session stays on the learned
  fast route. Result:
  `artifacts/c18z14-live-service-channel-active-quality-shift-smoke-result.json`
  run `c18z14-20260508-071644`; 60 batches / 480 packets delivered, backend
  fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z15 effective route-quality score telemetry is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.191` is deployed on docker-test, and
  node-agent `0.2.190` is built, published to docker-test downloads, registered
  in the stable update channel, and deployed to `test-1/2/3`. Published release
  id: `2e4cd0c8-2480-4637-b845-6dcb115dbebd`. Backend feedback reports now
  include decayed `effective_score_adjustment` alongside raw
  `score_adjustment`; node-agent consumes the effective score for active
  route-quality preference and exposes sorted `route_quality_preferences` in
  runtime telemetry with raw/effective score and decay reasons. Unit coverage:
  `TestFabricClientPacketIngressQualityPreferenceUsesEffectiveScore` and
  `TestServiceChannelRouteFeedbackReportIncludesEffectiveDecayedScore`. Live
  script
  `scripts/fabric/c18z15-live-service-channel-effective-quality-smoke.ps1`
  verifies route-quality preference telemetry, effective score visibility, and
  decayed effective score visibility after the active-session quality-shift
  scenario. Result:
  `artifacts/c18z15-live-service-channel-effective-quality-smoke-result.json`
  run `c18z14-20260508-073538`; 60 batches / 480 packets delivered, backend
  fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z16 per-channel route-quality fairness telemetry is implemented. Node-agent
  `0.2.191` is built, published to docker-test downloads, registered in the
  stable update channel, and deployed to `test-1/2/3`; backend remains
  `rap-backend:fabric-service-channel-0.2.191`. Published release id:
  `f072759c-5c3b-4ba0-936a-f59b6d3d7632`. Flow-scheduler channel stats now
  expose the applied `quality_preference_route_id`, effective/raw preference
  score, and preference reasons, so operators can see which logical channels
  actually used learned route quality. Unit coverage:
  `TestFabricClientPacketIngressQualityPreferencePreservesMultiChannelFairness`.
  Live script
  `scripts/fabric/c18z16-live-service-channel-quality-fairness-smoke.ps1`
  validates multi-channel quality-preference fairness after the active-session
  route-quality shift. Result:
  `artifacts/c18z16-live-service-channel-quality-fairness-smoke-result.json`
  run `c18z14-20260508-074943`; 60 batches / 480 packets delivered, 32 served
  logical channels, 32 channels with quality preference applied, backend
  fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z17 stale route-quality marker cleanup is implemented. Node-agent
  `0.2.192` is built, published to docker-test downloads, registered in the
  stable update channel, and deployed to `test-1/2/3`; backend remains
  `rap-backend:fabric-service-channel-0.2.191`. Published release id:
  `846881bd-e7e0-4212-b8c9-4a6012c6eff7`. Flow-scheduler channel stats now
  clear quality preference markers when the preference is no longer in the
  effective preference set or when the route manager withdraws that route. Unit
  coverage:
  `TestFabricClientPacketIngressClearsStaleQualityPreferenceMarkers` and
  `TestFabricClientPacketIngressClearsWithdrawnQualityPreferenceMarkers`.
  Live script
  `scripts/fabric/c18z17-live-service-channel-quality-cleanup-smoke.ps1`
  verifies cleanup after the active-session quality/fairness scenario. Result:
  `artifacts/c18z17-live-service-channel-quality-cleanup-smoke-result.json`
  run `c18z14-20260508-075750`; 60 batches / 480 packets delivered, active
  quality markers `32`, stale quality markers `0`, visible preferences `3`,
  backend fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z18 service-session-scoped flow scheduler memory is implemented.
  Node-agent `0.2.193` is built, published to docker-test downloads,
  registered in the stable update channel, and deployed to `test-1/2/3`;
  backend remains `rap-backend:fabric-service-channel-0.2.191`. Published
  release id: `05a3d29e-8a62-4bc8-84a3-1d00b794b9c9`. Runtime-sent flow
  scheduler channel keys now include the VPN/service session:
  `vpn:{vpnConnectionID}:flow-NN`. This keeps route memory, failed-route
  avoidance, served/drop counters, and route-quality markers isolated when
  several service-channel sessions share one entry/exit and hash to the same
  logical flow shard. Unit coverage:
  `TestFabricClientPacketIngressIsolatesRouteMemoryPerVPNConnection` and
  `TestFabricClientPacketIngressQualityPreferencePreservesMultiChannelFairness`.
  Live script
  `scripts/fabric/c18z18-service-channel-session-scoped-fairness-smoke.ps1`
  wraps the live C18Z17 quality path and verifies served live channels are
  session-scoped, unscoped served `flow-NN` channels are absent, quality
  markers are session-scoped, backend fallback is `0`, and flow drops are `0`.
  Result:
  `artifacts/c18z18-service-channel-session-scoped-fairness-smoke-result.json`
  run `c18z14-20260508-082520`; 60 batches / 480 packets delivered, served
  channels `32`, session-scoped served channels `32`, session-scoped quality
  channels `32`, unscoped served channels `0`, backend fallback `0`, flow drops
  `0`, temporary route intents expired.
- C18Z19 bounded parallel logical-flow send window is implemented. Node-agent
  `0.2.194` is built, published to docker-test downloads, registered in the
  stable update channel, and deployed to `test-1/2/3`; backend remains
  `rap-backend:fabric-service-channel-0.2.191`. Published release id:
  `926e5b84-4b0b-4f47-b1fe-798d8105679f`. The live node-agent runtime enables
  `MaxParallelFlowSends=4`, so independent scheduled logical channels can send
  concurrently instead of one slow channel blocking all following channels.
  This remains service-neutral and does not inspect HTTP/RDP/DNS/application
  traffic. Telemetry now exposes `max_parallel_flow_sends` and
  `send_flow_parallel_batches`. Unit coverage:
  `TestFabricClientPacketIngressParallelFlowWindowDoesNotBlockIndependentChannel`.
  Live script
  `scripts/fabric/c18z19-service-channel-parallel-flow-window-smoke.ps1` wraps
  the C18Z18 live route-quality/session-scoped path and verifies the parallel
  window is enabled and observed while backend fallback and flow drops stay at
  zero. Result:
  `artifacts/c18z19-service-channel-parallel-flow-window-smoke-result.json`
  run `c18z14-20260508-084133`; 60 batches / 480 packets delivered,
  `max_parallel_flow_sends=4`, `send_flow_parallel_batches=60`, served
  channels `32`, session-scoped quality channels `32`, backend fallback `0`,
  flow drops `0`, temporary route intents expired.
- C18Z20 per-channel latency/retry/in-flight telemetry and adaptive recommended
  send-window telemetry are implemented. Node-agent `0.2.195` is built,
  published to docker-test downloads, registered in the stable update channel,
  and deployed to `test-1/2/3`; backend remains
  `rap-backend:fabric-service-channel-0.2.191`. Published release id:
  `b9e198e0-e012-4600-ad14-856820aff41c`. Scheduler telemetry now includes
  global `in_flight`, `max_in_flight`, slow/failing channel counts, and
  per-channel `send_attempts`, `send_successes`, `send_failures`,
  `in_flight`, `max_in_flight`, and latency buckets. Ingress telemetry now
  includes `recommended_parallel_flow_sends`; the recommendation shrinks under
  bounded drops, degraded fallback recommendations, repeated failures, or
  slow/stalled channels. Unit coverage:
  `TestFabricFlowSchedulerRecommendsSmallerWindowUnderPressure` and
  `TestFabricClientPacketIngressParallelFlowWindowDoesNotBlockIndependentChannel`.
  Live script
  `scripts/fabric/c18z20-service-channel-adaptive-window-telemetry-smoke.ps1`
  wraps the C18Z19 live path and verifies the new telemetry on real docker-test
  nodes. Result:
  `artifacts/c18z20-service-channel-adaptive-window-telemetry-smoke-result.json`
  run `c18z14-20260508-085635`; 60 batches / 480 packets delivered,
  `max_parallel_flow_sends=4`, `recommended_parallel_flow_sends=4`,
  `scheduler_max_in_flight=4`, attempts/success/latency visible on 32 channels,
  backend fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z21 rolling per-channel/session quality windows are implemented.
  Node-agent `0.2.196` is built, published to docker-test downloads,
  registered in the stable update channel, and deployed to `test-1/2/3`;
  backend remains `rap-backend:fabric-service-channel-0.2.191`. Published
  release id: `813b2050-4d4e-444c-9bde-72b1d1f7dd35`. Scheduler decisions now
  use a bounded fresh quality window instead of lifetime-only drop/failure
  counters, so old pressure rolls out after newer successful samples. Telemetry
  now exposes scheduler-level `quality_window_sample_count`,
  `quality_window_failure_count`, `quality_window_slow_count`,
  `quality_window_drop_count`, and per-channel success/failure/slow/drop sample
  counts, average latency, and last update time. Unit coverage:
  `TestFabricFlowSchedulerRollingQualityWindowForgetsOldPressure`,
  `TestFabricFlowSchedulerRecommendsSmallerWindowUnderPressure`, and
  `TestFabricClientPacketIngressParallelFlowWindowDoesNotBlockIndependentChannel`.
  Live script
  `scripts/fabric/c18z21-service-channel-rolling-quality-window-smoke.ps1`
  wraps the C18Z20 live path and verifies the rolling-window telemetry on real
  docker-test nodes. Result:
  `artifacts/c18z21-service-channel-rolling-quality-window-smoke-result.json`
  run `c18z14-20260508-091952`; 60 batches / 480 packets delivered,
  scheduler quality-window samples `480`, failures `0`, drops `0`, window
  samples/success/latency visible on 32 channels, `recommended_parallel_flow_sends=4`,
  backend fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z22 backend durable route feedback now consumes the rolling quality
  window from node-agent heartbeat metadata. Backend
  `rap-backend:fabric-service-channel-0.2.197` is built and deployed on
  docker-test; node-agent remains `0.2.196` on `test-1/2/3`. For agents that
  expose `quality_window_*`, backend uses fresh rolling failure/drop/slow
  counts and rolling average latency when creating `fabric_service_channel`
  route feedback; old `last_failed_route_id`, `consecutive_failures`, and
  `stall_count` remain fallback inputs for older agents only. This prevents old
  route failures from dominating durable scoring after the channel has recovered
  with a clean rolling window. Unit coverage:
  `TestRecordHeartbeatUsesRollingQualityWindowForRouteFeedback` and
  `TestRecordHeartbeatPersistsServiceChannelRouteFeedbackForLaterLease`.
  Live script
  `scripts/fabric/c18z22-service-channel-rolling-feedback-smoke.ps1` wraps the
  C18Z21 live path and verifies persisted route feedback contains
  `service_channel_rolling_quality_window` plus payload `quality_window_*`
  fields. Result:
  `artifacts/c18z22-service-channel-rolling-feedback-smoke-result.json` run
  `c18z14-20260508-093100`; 60 batches / 480 packets delivered, route feedback
  count `1`, rolling feedback count `1`, healthy rolling feedback count `1`,
  rolling payload count `1`, backend fallback `0`, flow drops `0`.
- C18Z23 recovery hysteresis is implemented for recovered service-channel
  routes. Backend `rap-backend:fabric-service-channel-0.2.198` is built and
  deployed on docker-test; node-agent remains `0.2.196` on `test-1/2/3`.
  When a route has an operator-expire/manual retry cooldown from prior fenced
  feedback but now also has healthy rolling-window feedback, backend re-admits
  the route as `authorized` while applying a bounded recovery hysteresis score
  penalty (`150`) and `service_channel_recovery_hysteresis` reason. This keeps
  recovered routes available as alternates without immediately displacing a
  steady route and reducing route-selection flapping. Unit coverage:
  `TestIssueFabricServiceChannelLeaseDampensRecoveredRouteDuringRetryCooldown`
  and `TestRecordHeartbeatUsesRollingQualityWindowForRouteFeedback`. Live
  script
  `scripts/fabric/c18z23-service-channel-recovery-hysteresis-smoke.ps1` wraps
  the C18Z22 live path and verifies backend `0.2.198`, rolling feedback, and
  clean live forwarding. Result:
  `artifacts/c18z23-service-channel-recovery-hysteresis-smoke-result.json` run
  `c18z14-20260508-094111`; 60 batches / 480 packets delivered, backend
  fallback `0`, flow drops `0`, recovery hysteresis penalty `150`.
- C18Z24 recovery visibility is implemented for service-channel route
  diagnostics. Backend `rap-backend:fabric-service-channel-0.2.199` is built
  and deployed on docker-test; node-agent remains `0.2.196` on `test-1/2/3`.
  Route feedback API responses and node-scoped service-channel feedback reports
  now expose `recovery_state`, `recovery_hysteresis_active`, and
  `recovery_hysteresis_penalty`, while route path decision reports count
  `recovery_hysteresis_count`. Admin diagnostics now show recovered/hysteresis
  chips and a recovery column beside route feedback status. Unit coverage:
  `TestIssueFabricServiceChannelLeaseDampensRecoveredRouteDuringRetryCooldown`,
  `TestServiceChannelRouteFeedbackReportExposesRecoveryState`, and
  `TestRoutePathDecisionReportCountsRecoveryHysteresis`. Smoke result:
  `artifacts/c18z24-service-channel-recovery-visibility-smoke-result.json`;
  route feedback API exposed recovery shape for 109 observations, backend
  image `0.2.199` was live, and the web-admin build was published to
  `rap_web_admin`.
- C18Z25 recovery promotion policy is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.200` is built and deployed on
  docker-test; node-agent remains `0.2.196`. A route under manual retry
  cooldown remains `recovered` with hysteresis penalty until it reports at
  least 64 clean rolling-window samples (`success >= 64`, failures/slow/drops
  zero). After that it is promoted back to steady `healthy`, gets
  `recovery_promoted=true`, `service_channel_recovery_promoted`, and no
  hysteresis penalty. Admin/API now expose promoted counts/flags alongside
  recovered/hysteresis state. Smoke result:
  `artifacts/c18z25-service-channel-recovery-promotion-smoke-result.json`;
  backend image `0.2.200` was live and route-feedback API exposed recovery
  state for 109 observations.
- C18Z26 recovery demotion policy is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.201` is built and deployed on
  docker-test; node-agent remains `0.2.196`. If a previously recovered or
  promoted route under retry cooldown reports fresh rolling failures, drops,
  slow samples, degraded fallback, rebuild recommendation, or fenced feedback,
  backend now exposes `recovery_demoted=true` with a concrete
  `recovery_reason` such as `service_channel_recovery_demoted_failure`,
  `..._slow`, `..._rebuild`, or `..._fenced`. Route score reasons include
  `service_channel_recovery_demoted` and the specific demotion reason, and
  route path decision reports count `recovery_demoted_count`. Admin diagnostics
  now show demoted feedback/path chips and the demotion reason. Smoke result:
  `artifacts/c18z26-service-channel-recovery-demotion-smoke-result.json`;
  backend image `0.2.201` was live and route-feedback API exposed recovery
  state for 109 observations.
- C18Z27 recovery policy tuning is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.202` is built and deployed on
  docker-test; node-agent remains `0.2.196`. Effective service-channel
  recovery policy now has a strict default contract and optional cluster
  metadata override at `fabric_service_channel_recovery_policy`. API endpoints
  `GET/PUT /clusters/{clusterID}/fabric/service-channels/recovery-policy`
  expose and update hysteresis penalty, promotion minimum samples, demotion
  thresholds for failures/drops/slow samples, and rebuild/fenced demotion
  toggles. Lease route selection, route feedback reports, and node-scoped
  synthetic config feedback consume the effective policy. Web-admin shows and
  edits the policy in the service-channel route feedback card. Smoke result:
  `artifacts/c18z27-service-channel-recovery-policy-smoke-result.json`; live
  API updated policy values, then restored strict defaults
  (`penalty=150`, `promotion_min_samples=64`, demotion thresholds `1`).
- C18Z28 recovery policy provenance is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.203` is built and deployed on
  docker-test; node-agent remains `0.2.196`. `FabricServiceChannelRoute`,
  `FabricServiceChannelLease`, signed lease authority payloads,
  service-channel route feedback reports, and route path decision reports now
  carry the effective recovery policy used for scoring and recovery decisions.
  This makes every primary/alternate/fallback choice auditable against the
  policy source and thresholds that produced it. Web-admin node diagnostics
  show the service-channel feedback policy and route decision policy source.
  Smoke result:
  `artifacts/c18z28-service-channel-recovery-policy-provenance-smoke-result.json`;
  live synthetic config and live lease issuance both exposed recovery policy
  provenance on docker-test.
- C18Z29 feedback provenance guardrails are implemented. Backend
  `rap-backend:fabric-service-channel-0.2.204` is built and deployed on
  docker-test; node-agent remains `0.2.196`. Recovery policy now has a stable
  fingerprint. Backend recognizes optional runtime feedback provenance fields
  (`recovery_policy_fingerprint`, `route_generation`, `route_policy_version`,
  `policy_version`), exposes observed/effective fingerprints/generations on
  route feedback observations, and reports missing/stale counters. Explicit
  stale policy/generation feedback is scored conservatively, cannot fence a
  current route, and cannot request rebuild/demotion; missing provenance stays
  compatible for current old agents but is visible in diagnostics. Web-admin
  shows provenance warnings in service-channel feedback. Smoke result:
  `artifacts/c18z29-service-channel-feedback-provenance-guard-smoke-result.json`.
- C18Z30 node-agent feedback provenance is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.209` and node-agent `0.2.208` are
  built and deployed on docker-test (`test-1/2/3`). Node-agent now preserves the
  signed synthetic config contract for recovery feedback/route decision fields
  and records per-flow `recovery_policy_fingerprint`, `route_policy_version`,
  and `route_generation` at send time, so feedback remains auditable even after
  route churn/expiry. Backend heartbeat parsing now preserves those fields into
  durable service-channel feedback payloads. Live smoke passed with 28/28
  runtime channel stats carrying provenance, 3/3 feedback observations carrying
  provenance, and no missing/stale provenance counters. Artifacts:
  `artifacts/c18z30-node-telemetry-provenance-live-smoke-base-result.json` and
  `artifacts/c18z30-node-agent-feedback-provenance-smoke-result.json`.
- C18Z31 service-channel rebuild ledger is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.211` is built and deployed on
  docker-test; node-agent remains `0.2.208` on `test-1/2/3`. Backend now keeps
  durable route rebuild attempt history in
  `fabric_service_channel_route_rebuild_attempts`, upserted from synthetic
  config route decisions when service-channel feedback requests rebuild. The
  ledger stores trigger/rebuild status, old route, selected replacement,
  policy fingerprint, generation, feedback status/reasons, latency/failure
  counters, outcome, and compact decision payload. API endpoint
  `GET /clusters/{clusterID}/fabric/service-channels/rebuild-attempts` exposes
  the history; web-admin loads it into Service-channel route feedback
  diagnostics as a rebuild ledger table. Migration `000026` is applied on
  docker-test. Live smoke passed:
  `artifacts/c18z31-base-active-rebuild-smoke-result.json` and
  `artifacts/c18z31-service-channel-rebuild-ledger-smoke-result.json`.
- C18Z32 service-channel rebuild timeline is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.213` is built and deployed on
  docker-test; node-agent remains `0.2.208` on `test-1/2/3`. The rebuild
  attempts API now enriches durable ledger rows with node-agent heartbeat
  correlation: matching `route_manager_transition`, route-generation apply or
  withdrawn decision, post-rebuild selected route, flow packet/drop/failure
  counters, and a compact chronological `timeline` with
  `backend_decision`, `node_route_generation_apply`,
  `node_route_manager_transition`, and `post_rebuild_traffic` stages. Matching
  is generation-strict when the backend attempt has a generation, preventing
  stale transition/status matches. Web-admin rebuild ledger shows backend,
  agent, route-generation, and traffic columns. Live smoke passed:
  `artifacts/c18z32-base-rebuild-ledger-smoke-result.json` and
  `artifacts/c18z32-service-channel-rebuild-timeline-smoke-result.json`.
- C18Z33 service-channel rebuild guardrails are implemented. Backend
  `rap-backend:fabric-service-channel-0.2.214` is built and deployed on
  docker-test; node-agent remains `0.2.208`. Rebuild attempts API now adds
  computed guard fields: `guard_status`, `guard_severity`, `guard_reason`,
  age, and transition/traffic deadlines. Successful correlated rebuilds report
  `guard_status=ok`, `guard_severity=good`; missing node transition,
  route-generation correlation, post-rebuild traffic, unexpected selected
  route, or post-rebuild drops/failures surface as warn/bad states. Web-admin
  shows guard chips and counts in the service-channel rebuild ledger. Live
  smoke passed: `artifacts/c18z33-base-rebuild-ledger-smoke-result.json` and
  `artifacts/c18z33-service-channel-rebuild-guard-smoke-result.json`.
- C18Z34 service-channel rebuild health summary is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.215` is built and deployed on
  docker-test; node-agent remains `0.2.208`. New endpoint
  `GET /clusters/{clusterID}/fabric/service-channels/rebuild-health` returns a
  cluster-level operational summary over the durable rebuild ledger/timeline:
  counts by guard status/severity, applied/pending counts, affected reporter
  nodes/routes, most recent bad attempts, and recommended operator action.
  Web-admin shows the summary as a Rebuild health subpanel above the rebuild
  ledger. Live smoke passed:
  `artifacts/c18z34-base-rebuild-guard-smoke-result.json` and
  `artifacts/c18z34-service-channel-rebuild-health-smoke-result.json`.
- C18Z35 service-channel rebuild alert silence lifecycle is implemented.
  Backend `rap-backend:fabric-service-channel-0.2.216` is built and deployed on
  docker-test; node-agent remains `0.2.208`. Migration `000027` creates
  `fabric_service_channel_rebuild_alert_silences`, applied on docker-test. New
  API `POST /clusters/{clusterID}/fabric/service-channels/rebuild-health/silences`
  records bounded operator silence for an exact alert fingerprint:
  reporter node, route, guard status, and generation. Rebuild health now
  separates total bad/warn from active bad/warn and silenced counts; silenced
  alerts are omitted from affected nodes/routes and active bad attempt lists.
  A new generation, route, or reporter remains active by design. Web-admin
  exposes `silence 6h` on active bad rebuild-health rows. Live smoke passed:
  `artifacts/c18z35-base-rebuild-health-smoke-result.json` and
  `artifacts/c18z35-service-channel-rebuild-alert-silence-smoke-result.json`.
- C18Z36 service-channel rebuild alert resurfacing is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.217` is built and deployed on
  docker-test; node-agent remains `0.2.208`. Rebuild health marks active
  bad/warn attempts as `alert_resurfaced` when an active silence exists for the
  same reporter node, route, and guard status but a different generation. The
  summary exposes `resurfaced_count` and `resurfaced_attempts`, including the
  previous silenced generation and silence expiry. Web-admin shows a resurfaced
  chip/table and allows silencing the new generation separately. Live smoke
  passed: `artifacts/c18z36-base-rebuild-health-smoke-result.json` and
  `artifacts/c18z36-service-channel-rebuild-alert-resurface-smoke-result.json`.
- C18Z37 service-channel readiness gate is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.218` is built and deployed on
  docker-test; node-agent remains `0.2.208`. New endpoint
  `GET /clusters/{clusterID}/fabric/service-channels/readiness` returns a fast
  recent-window verdict: `clean`, `degraded`, or `blocked`, with active
  bad/warn counts, resurfaced/silenced counts, missing transition,
  route-generation, post-rebuild traffic, unexpected-route, and post-rebuild
  degraded counters plus blocking/degraded reasons and recommended operator
  action. Web-admin shows this as a top-level readiness panel in
  Service-channel route feedback. Readiness and default admin health queries
  are intentionally capped to a small recent window so the operator view stays
  responsive after many rebuild attempts; deep ledger diagnostics remain a
  separate next layer. Live smoke passed:
  `artifacts/c18z37-base-rebuild-health-smoke-result.json` and
  `artifacts/c18z37-service-channel-readiness-smoke-result.json`.
- C18Z38 service-channel rebuild ledger enrichment split is implemented.
  Backend `rap-backend:fabric-service-channel-0.2.219` is built and deployed
  on docker-test; node-agent remains `0.2.208`. The rebuild attempts API now
  defaults to `enrichment=summary`, returning durable ledger rows without the
  expensive heartbeat/timeline guard correlation. Operators can request
  `enrichment=deep` explicitly for per-route investigation. Web-admin defaults
  to the fast ledger, shows timeline/guard fields as deep-only in summary mode,
  and provides a manual deep ledger toggle. C18Z32/C18Z33 smokes now request
  deep enrichment. Live smoke passed:
  `artifacts/c18z38-service-channel-rebuild-ledger-enrichment-smoke-result.json`.
- C18Z39 service-channel rebuild ledger drilldown is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.220` is built and deployed on
  docker-test; node-agent remains `0.2.208`. The rebuild attempts API now
  accepts `generation` and `offset`, allowing narrow deep investigations by
  reporter node, route, service class, and route generation with bounded
  pagination. Web-admin adds rebuild ledger filters for reporter/route/
  generation/service plus prev/next paging in deep mode. Live smoke passed:
  `artifacts/c18z39-service-channel-rebuild-ledger-drilldown-smoke-result.json`.
- C18Z40 service-channel rebuild incident grouping is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.222` is built and deployed on
  docker-test; node-agent remains `0.2.208`. New endpoint
  `GET /clusters/{clusterID}/fabric/service-channels/rebuild-incidents`
  groups the bounded recent rebuild window by reporter node, route, service
  class, generation, and guard status, exposing first/last seen, attempt count,
  latest guard/replacement/outcome, silence/resurface flags, and recommended
  action. The incident window is capped to 5 to keep default admin refresh
  bounded; broader investigation still uses filtered deep ledger. Web-admin
  shows a Rebuild incidents list and `open deep` loads the exact filtered deep
  ledger slice for that incident. Live smoke passed:
  `artifacts/c18z40-service-channel-rebuild-incidents-smoke-result.json`.
- C18Z41 service-channel rebuild incident actions are implemented. Backend
  `rap-backend:fabric-service-channel-0.2.223` is built and deployed on
  docker-test; node-agent remains `0.2.208`. New API
  `POST /clusters/{clusterID}/fabric/service-channels/rebuild-incidents/investigations`
  records an audit event when an operator opens a deep rebuild investigation.
  Web-admin incident rows now expose `open deep` with audit and `silence 6h`
  using the incident fingerprint fields; after silence the panel refreshes only
  rebuild health/readiness/incidents instead of the whole cluster scope. Live
  smoke passed:
  `artifacts/c18z41-service-channel-rebuild-incident-actions-smoke-result.json`.
- C18Z42 service-channel rebuild correlation snapshots are implemented.
  Backend `rap-backend:fabric-service-channel-0.2.224` is built and deployed
  on docker-test; node-agent remains `0.2.208`. Migration `000028` adds
  durable correlation/guard snapshot columns to
  `fabric_service_channel_route_rebuild_attempts`, including node transition,
  route-generation, post-rebuild traffic, guard status/severity/reason,
  compact timeline, and `correlation_snapshot_at`. Deep enrichment now writes
  the snapshot once; later deep/readiness/health/incidents reuse it and only
  recompute age-sensitive guard state without scanning heartbeat history.
  External summary ledger still strips guard/timeline fields to preserve the
  fast C18Z38 contract. On docker-test, applying `000028` manually was required
  before smoke because this manual backend redeploy path does not auto-apply
  migrations. Live smoke passed twice; after warm snapshot timings were roughly
  summary 92 ms, deep 2 ms, incidents 2 ms:
  `artifacts/c18z42-service-channel-rebuild-correlation-snapshot-smoke-result.json`.
- C18Z43 service-channel schema preflight is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.225` is built and deployed on
  docker-test; web-admin is redeployed. New endpoint
  `GET /clusters/{clusterID}/fabric/service-channels/schema-status` checks the
  DB relation/columns required by migration `000028` before operators rely on
  rebuild health/readiness/incidents. Web-admin shows a Fabric schema preflight
  panel beside service-channel readiness, with required/missing check counts and
  operator action. Live smoke passed:
  `artifacts/c18z43-service-channel-schema-preflight-smoke-result.json`.
- C18Z44 service-channel rebuild snapshot warmup is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.226` is built and deployed on
  docker-test; web-admin is redeployed. New endpoint
  `POST /clusters/{clusterID}/fabric/service-channels/rebuild-snapshots/warmup`
  performs a bounded proactive pass over recent rebuild attempts. It fills
  missing correlation snapshots, counts stale snapshots, and defers heavy stale
  rescans because age-sensitive guard state is already recomputed from cached
  snapshots on read. Web-admin adds a `warm snapshots` action and displays
  warmed/fresh/missing/stale/deferred/error counts. Live smoke passed:
  `artifacts/c18z44-service-channel-rebuild-snapshot-warmup-smoke-result.json`.
- C18Z45 service-channel rebuild snapshot auto-warmup is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.227` is built and deployed on
  docker-test; node-agent remains `0.2.208`. Heartbeat processing now performs a
  bounded missing-snapshot maintenance pass for the reporting node's recent
  rebuild attempts. It only persists a snapshot when the heartbeat contains
  runtime evidence such as post-rebuild traffic or matched route-manager/
  route-generation state, preventing backend-only timelines from becoming stale
  cache entries. Auto-warmup writes an audit event
  `fabric.service_channel_rebuild_snapshot.auto_warmup` with trigger, heartbeat,
  warmed route IDs, generations, rebuild IDs, counts, and errors. Live smoke
  passed:
  `artifacts/c18z45-service-channel-rebuild-snapshot-auto-warmup-smoke-result.json`.
- C18Z46 service-channel rebuild snapshot maintenance health is implemented.
  Backend `rap-backend:fabric-service-channel-0.2.228` is built and deployed
  on docker-test; web-admin is redeployed. New endpoint
  `GET /clusters/{clusterID}/fabric/service-channels/rebuild-snapshots/health`
  exposes bounded snapshot-cache maintenance status: recent attempt count,
  valid/missing/overdue runtime-evidence snapshots, heartbeat threshold, latest
  auto-warmup audit summary, and per-node warmed/error/missing counts. Web-admin
  adds a `Snapshot maintenance` panel beside schema/readiness. Live smoke
  passed:
  `artifacts/c18z46-service-channel-rebuild-snapshot-health-smoke-result.json`.
- C18Z47 service-channel signed lease enforcement is implemented. Node-agent
  release `0.2.230` is built, published under `/downloads`, registered as the
  active `rap-node-agent` dev release, and deployed on docker-test
  `test-1/2/3`; all three report `0.2.230`, healthy, and current after policy
  update. When a cluster authority public key is pinned, the node-agent now
  rejects unsigned `rap_fsc_*` service-channel requests and requires the
  signed `rap.fabric_service_channel_lease_authority.v1` payload/signature
  headers. Legacy unsigned tokens remain accepted only in unpinned test mode.
  Live smoke proved unsigned POST is rejected with 403 while signed lease POST
  is accepted with 202:
  `artifacts/c18z47-service-channel-signed-lease-enforcement-smoke-result.json`.
- C18Z48 service-channel backend introspection compatibility is implemented.
  Backend `rap-backend:fabric-service-channel-0.2.231` is built/deployed on
  docker-test. Node-agent/host-agent artifacts `0.2.232` are published under
  `/downloads`; `rap-node-agent` release `0.2.232` is registered and deployed
  on `test-1/2/3`, and all three report healthy/current. When signed
  service-channel authority headers are absent but cluster authority is pinned,
  node-agent now calls backend lease introspection before accepting an unsigned
  token. Bad tokens are still rejected. Live smoke passed:
  `artifacts/c18z48-service-channel-introspection-smoke-result.json`.
- C18Z49 service-channel acceptance telemetry is implemented in node-agent
  `0.2.232`. Each accepted Fabric Service Channel ingress records
  `accepted_by=signed|introspection|legacy_unsigned`, route preference, and
  backend-fallback state in structured node logs. HTTP packet ingress also
  returns `X-RAP-Service-Channel-Accepted-By` for smoke/diagnostics.
- C18Z50 durable service-channel lease introspection is implemented. Migration
  `000029_fabric_service_channel_leases` adds a durable lease table keyed by
  cluster/channel and stores only `token_hash` plus a scrubbed lease payload
  with the raw bearer token removed. Backend
  `rap-backend:fabric-service-channel-0.2.233` is built/deployed on
  docker-test after applying the migration. Introspection now reads memory
  first, then durable storage, so compatibility clients survive backend
  restart. Live smoke restarted `rap_test_backend`, accepted the unsigned token
  through introspection, rejected a bad token, and verified the durable lease
  omits the raw token:
  `artifacts/c18z50-service-channel-durable-introspection-smoke-result.json`.
- C18Z51 service-channel lease maintenance is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.234` is built/deployed on
  docker-test. New endpoints list durable service-channel lease maintenance
  state and run bounded expired-lease cleanup:
  `GET /clusters/{clusterID}/fabric/service-channels/leases` and
  `POST /clusters/{clusterID}/fabric/service-channels/leases/cleanup`.
  Web-admin adds a `Service-channel leases` panel with active/expired counts,
  recent lease rows, and cleanup action. Live smoke issued a 1-second lease,
  observed it as expired, cleaned it up, and verified it disappeared:
  `artifacts/c18z51-service-channel-lease-maintenance-smoke-result.json`.
- C18Z52 service-channel access telemetry visibility is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.235` is built/deployed on
  docker-test; node-agent/host-agent `0.2.235` artifacts are published under
  `/downloads`, registered as active dev releases, and deployed on
  `test-1/2/3`. Node-agent now reports accepted service-channel ingress
  counters by `signed`, `introspection`, and `legacy_unsigned`, including
  backend-fallback count and last accepted timestamp. Backend exposes
  `GET /clusters/{clusterID}/fabric/service-channels/access-telemetry`,
  reading telemetry observations with heartbeat metadata fallback. Web-admin
  adds a `Service-channel access` panel with cluster totals and per-node rows.
  Live smoke sent packets through test-1, observed
  `X-RAP-Service-Channel-Accepted-By: introspection`, and verified backend
  aggregate visibility:
  `artifacts/c18z52-service-channel-access-telemetry-smoke-result.json`.
- C18Z53 service-channel access/session correlation is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.236` is built/deployed on
  docker-test; node-agent remains `0.2.235`. The access telemetry endpoint now
  correlates accepted ingress counters with active durable service-channel
  leases, selected entry/exit nodes, primary route status, explicit backend
  fallback, and latest route-quality feedback when a route exists. Web-admin's
  `Service-channel access` panel now shows active channel rows before per-node
  counters, so operators can see whether a live service channel is using normal
  route quality feedback or degraded backend fallback. Live smoke created an
  active lease, sent ingress traffic through test-1, and verified active
  channel correlation plus fallback visibility:
  `artifacts/c18z53-service-channel-access-correlation-smoke-result.json`.
- C18Z54 normal-route access correlation is smoke-proven on the existing
  C18Z53 backend/admin surface. New smoke creates a temporary direct
  `vpn_packets` route intent, injects healthy route-quality heartbeat
  telemetry, issues a service-channel lease that selects the normal primary
  route, sends ingress traffic, and verifies the access telemetry active
  channel row is `ready`, not backend fallback, with `route_feedback_status`
  `healthy`, rolling quality counters, and last send duration:
  `artifacts/c18z54-service-channel-normal-route-access-smoke-result.json`.
- C18Z55 degraded normal-route access correlation is smoke-proven on the same
  backend/admin surface. The smoke first issues a lease on a normal primary
  `vpn_packets` route, then injects degraded/fenced route-quality heartbeat
  feedback for that already-selected route. Access telemetry correctly reports
  the active channel as `ready` and `force_backend_fallback=false`, while route
  feedback is `fenced`, rolling failure/drop/slow counters are visible, and the
  aggregate access status becomes `degraded` because `degraded_route_count > 0`:
  `artifacts/c18z55-service-channel-degraded-route-access-smoke-result.json`.
- C18Z56 active-channel remediation diagnostics are implemented. Backend
  `rap-backend:fabric-service-channel-0.2.237` is built/deployed on
  docker-test; node-agent remains `0.2.235`. Active access telemetry channel
  rows now include `remediation_action`, `remediation_reason`,
  `remediation_route_id`, `remediation_route_status`, and an operator hint.
  Decisions distinguish explicit backend fallback, degraded/fenced normal
  route with an authorized alternate (`prefer_alternate_route`), degraded/fenced
  route needing rebuild (`rebuild_route`), and healthy route (`none`).
  Web-admin shows the remediation action in the `Service-channel access`
  active-channel table. C18Z55 smoke now verifies
  `remediation_action=rebuild_route`; backend unit coverage verifies the
  alternate-route remediation branch.
- C18Z56 alternate-route remediation is also live-smoke-proven. New smoke
  creates primary and authorized alternate `vpn_packets` routes, issues a lease
  while primary is still healthy/selected, then injects fenced feedback for the
  selected primary. Access telemetry keeps the active channel on the normal
  route with `force_backend_fallback=false`, reports `route_feedback_status`
  `fenced`, and recommends `remediation_action=prefer_alternate_route` with the
  alternate route id/status; `degraded_fallback_channel_count` stays zero:
  `artifacts/c18z56-service-channel-alternate-remediation-smoke-result.json`.
- C18Z57 bounded remediation command contract is implemented. Backend
  `rap-backend:fabric-service-channel-0.2.238` is built/deployed on
  docker-test; node-agent remains `0.2.235`. Active access telemetry channel
  rows now include `remediation_command` for non-noop remediation actions, with
  schema version, deterministic command id, action, channel/resource/service,
  entry/exit, primary route, replacement route when present, reason/operator
  hint, issued time, and a bounded TTL capped to the lease lifetime. Web-admin
  marks remediation rows with `cmd` when this machine-readable command is
  present. Live smoke proves a fenced selected primary route with an authorized
  alternate emits a `prefer_alternate_route` command pointing at the alternate:
  `artifacts/c18z57-service-channel-remediation-command-smoke-result.json`.
- C18Z58 service-channel remediation command consumption is implemented.
  Backend `rap-backend:fabric-service-channel-0.2.239` and node-agent
  `rap-node-agent:0.2.237` are built/deployed on docker-test (`test-1/2/3`).
  Backend now projects active `remediation_command` items into node-scoped
  synthetic mesh config as `service_channel_remediation_commands`. Node-agent
  parses those commands and turns `prefer_alternate_route` into an explicit
  route-manager `applied` decision with source
  `service_channel_remediation_command`, so an active channel that still
  presents the old primary route can be routed through the replacement route.
  Web-admin node details show remediation-command count/table in the Mesh tab.
  Live smoke proves access telemetry, synthetic config projection, and
  node-agent route-manager consumption:
  `artifacts/c18z58-service-channel-remediation-apply-smoke-result.json`.
- C18Z59 active remediation traffic proof is smoke-proven on the same
  backend/node-agent images with production forwarding enabled on docker-test
  `test-1/2/3`. The smoke sends service-channel traffic before/after the
  remediation command is consumed, then verifies runtime heartbeat evidence:
  `last_selected_route_id` and flow-scheduler `last_route_id` move to the
  replacement route, `send_successes=1`, `send_failures=0`,
  `send_fallback_local=0`, and no degraded backend fallback is recommended.
  Result:
  `artifacts/c18z59-service-channel-remediation-traffic-smoke-result.json`.
- C18Z60 multi-flow remediation traffic proof is smoke-proven. The smoke sends
  a batch of twelve IPv4/TCP-like packets that classify into multiple
  independent VPN flow channels after the remediation command is consumed.
  Runtime heartbeat evidence shows the replacement route selected, at least two
  flow-scheduler channels on that route, no local/backend fallback, no flow
  drops, and no route send failures. Result:
  `artifacts/c18z60-service-channel-remediation-multiflow-smoke-result.json`.
- C18Z61 pressure remediation traffic proof is smoke-proven. The smoke sends a
  batch of 128 IPv4/TCP-like packets after remediation; runtime evidence shows
  32 replacement-route flow stats, scheduler high-watermark 5,
  max-in-flight 4, `send_fallback_local=0`, route failures 0, and flow/scheduler
  drops 0. Result:
  `artifacts/c18z61-service-channel-remediation-pressure-smoke-result.json`.
- C18Z62 service-channel QoS class wiring is implemented in node-agent and
  live-smoke-proven on docker-test image `rap-node-agent:0.2.238-c18z62`.
  Service-channel HTTP ingress accepts neutral `X-RAP-Traffic-Class`
  (`control`, `interactive`, `reliable`, `bulk`, `droppable`) and the flow
  scheduler keeps distinct traffic-class channel ids/stats while preserving the
  old default bulk channel ids. Unit tests prove priority ordering
  `control > interactive > reliable > bulk > droppable`; live smoke proves a
  bulk 128-packet pressure batch plus an interactive packet both move through
  the remediation replacement route with no local/backend fallback, drops, or
  route failures. Result:
  `artifacts/c18z62-service-channel-remediation-qos-smoke-result.json`.
- C18Z63 concurrent QoS isolation is implemented and unit-proven. A controlled
  runtime test holds a bulk traffic-class send in-flight with a blocking
  production transport, then sends an independent interactive traffic-class
  packet through the same ingress; the interactive send completes before the
  bulk release, with `MaxInFlight >= 2`, traffic-class-specific stats, no drops,
  and no failures. This proves the shared Fabric Service Channel runtime does
  not globally serialize interactive/control-style traffic behind bulk work.
  Artifact:
  `artifacts/c18z63-service-channel-concurrent-qos-go-test.jsonl`.
- C18Z64 traffic-class telemetry aggregation is implemented and live-proven on
  docker-test image `rap-node-agent:0.2.239-c18z64`. `rap.fabric_flow_scheduler.v1`
  snapshots now include `traffic_class_counts`, giving backend/admin/diagnostics
  a compact count of active flow channels per traffic class without scanning
  every channel stat. Unit coverage proves the counts for explicit
  control/interactive/bulk classes and for the concurrent bulk+interactive
  isolation case. Live smoke re-ran the QoS path on `test-1/2/3`; latest
  heartbeat snapshot showed `traffic_class_counts` `bulk=32`,
  `interactive=12`, drops 0. Artifacts:
  `artifacts/c18z64-service-channel-traffic-class-telemetry-go-test.jsonl`,
  `artifacts/c18z64-service-channel-traffic-class-telemetry-live-smoke-result.json`,
  and
  `artifacts/c18z64-service-channel-traffic-class-telemetry-live-snapshot.json`.
- C18Z65/C18Z66 backend/admin QoS diagnostics are implemented and live-proven.
  Backend `rap-backend:fabric-service-channel-0.2.241-c18z66` is deployed on
  docker-test and projects runtime `traffic_class_counts`, flow channel count,
  max in-flight, dropped, and high-watermark from node heartbeats into
  `GET /fabric/service-channels/access-telemetry` at node, active-channel, and
  cluster aggregate levels. Web-admin Service-channel access shows flow QoS
  chips/rows for cluster totals, active channels, and nodes. Live API aggregate
  result showed `bulk=32`, `interactive=12`, `flow_channel_count=44`,
  `flow_max_in_flight=4`. Artifacts:
  `artifacts/c18z65-service-channel-access-qos-telemetry-api-result.json`,
  `artifacts/c18z65-service-channel-access-qos-telemetry-smoke-result.json`,
  and
  `artifacts/c18z66-service-channel-access-qos-aggregate-api-result.json`.
- C18Z67 live concurrent QoS proof is implemented and smoke-proven against
  docker-test backend `rap-backend:fabric-service-channel-0.2.241-c18z66` and
  node-agent image `rap-node-agent:0.2.239-c18z64`. The smoke pushes six
  parallel bulk service-channel HTTP packet requests while an interactive
  traffic-class request is injected through the same entry path after
  remediation. Run `c18z67-20260508-213452` accepted all 6 bulk requests,
  forwarded 3072 post-remediation packets, completed the interactive request in
  132 ms, observed 32 bulk and 12 interactive replacement-route flow stats, and
  kept local/backend fallback, route failures, flow drops, and scheduler drops
  at 0. Artifact:
  `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
- C18Z68 service-channel flow-health guard is implemented and deployed on
  docker-test as `rap-backend:fabric-service-channel-0.2.242-c18z68`, with
  web-admin rebuilt/deployed. Access telemetry now projects
  `flow_health_status` and `flow_health_reason` at cluster, node, and
  active-channel levels from traffic-class counts, queue pressure, flow drops,
  backend fallback, route-quality failures/drops/slow samples, and route send
  latency. Web-admin shows explicit flow-health chips beside flow QoS so
  sustained bulk pressure, degraded latency, fallback, and drops are visible
  before adding user services. Verification passed:
  `go test ./internal/modules/cluster`, web-admin `npm run build`, updated
  C18Z67 live smoke against backend `0.2.242-c18z68`, and live API artifact
  `artifacts/c18z68-service-channel-flow-health-api-result.json`.
- C18Z69 node-side adaptive backpressure is implemented and deployed on
  docker-test image `rap-node-agent:0.2.243-c18z69` for `test-1/2/3`.
  `FabricFlowScheduler` now calculates per-traffic-class
  `recommended_parallel_windows` and reports `adaptive_backpressure_active` /
  `adaptive_backpressure_reason` in runtime heartbeat snapshots. Bulk and
  droppable classes are reduced first under pressure, reliable is reduced
  moderately, while control/interactive keep their full window unless their own
  class has drops/failures/slow samples. Live C18Z69 smoke wraps the C18Z67
  pressure path and verified `bulk=1`, `droppable=1`, `reliable=3`,
  `interactive=4`, `control=4`, `bulk=32`, `interactive=12`, high-watermark
  72, max-in-flight 4, drops 0, and
  `bulk_window_reduced_to_protect_interactive`. Artifacts:
  `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json` and
  `artifacts/c18z69-service-channel-adaptive-backpressure-smoke-result.json`.
- C18Z70 backend/admin adaptive backpressure visibility is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.244-c18z70`; web-admin is rebuilt and
  deployed. Access telemetry now projects node-agent
  `recommended_parallel_windows`, `adaptive_backpressure_active`, and
  `adaptive_backpressure_reason` at cluster, node, and active-channel levels.
  Cluster aggregation uses the minimum non-zero recommended window per class,
  so the operator sees the most conservative active runtime limit. Web-admin
  shows adaptive windows next to flow health and flow QoS. Live API returned
  `adaptive=true`, reason `bulk_window_reduced_to_protect_interactive`, and
  windows `bulk=1`, `droppable=1`, `reliable=3`, `interactive=4`,
  `control=4`. Verification passed: `go test ./internal/modules/cluster`,
  web-admin `npm run build`, C18Z69 live smoke, and
  `artifacts/c18z70-service-channel-adaptive-telemetry-api-result.json`.
- C18Z71 adaptive policy contract is implemented and deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.245-c18z71` with node-agent image
  `rap-node-agent:0.2.245-c18z71` on `test-1/2/3`. Backend exposes audited
  `GET/PUT /clusters/{clusterID}/fabric/service-channels/adaptive-policy` for
  max parallel window, queue/bulk pressure thresholds, and per-class windows.
  The effective policy is embedded in signed node synthetic config and
  node-agent runtime heartbeat snapshots now report
  `adaptive_policy_fingerprint`. The scheduler consumes the policy at runtime:
  default policy preserves the C18Z69 behavior, while the C18Z71 live smoke
  proved an operator policy can raise max window to 6 and bulk pressure window
  to 2 while keeping interactive/control at 6. During smoke, a signed synthetic
  config hash mismatch was found and fixed by preserving adaptive policy
  provenance fields in the node-agent client model. Verification passed:
  `go test ./internal/modules/cluster`,
  `go test ./cmd/rap-node-agent ./internal/mesh ./internal/vpnruntime ./internal/client ./internal/config`,
  web-admin `npm run build`, C18Z71 live smoke, and C18Z69 regression smoke.
  Artifacts:
  `artifacts/c18z71-service-channel-adaptive-policy-smoke-result.json` and
  `artifacts/c18z69-service-channel-adaptive-backpressure-smoke-result.json`.
- C18Z72 service-channel pool/failover policy contract is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.246-c18z72`; node-agent remains
  `rap-node-agent:0.2.245-c18z71` on `test-1/2/3`. Backend exposes audited
  `GET/PUT /clusters/{clusterID}/fabric/service-channels/pool-policy` for
  entry/exit pool constraints, preferred entry/exit, selection strategy,
  route/entry/exit failover modes, backend fallback allowance, and sticky
  session mode. Lease issuance now applies the effective policy before route
  selection, constrains `entry_pool`/`exit_pool`, chooses policy preferred
  nodes when present, embeds `pool_policy` provenance in the lease, and signs
  it into `rap.fabric_service_channel_lease_authority.v1`. Web-admin API/types
  know the new policy contract. Verification passed:
  `go test ./internal/modules/cluster`, web-admin `npm run build`,
  C18Z72 live smoke, and C18Z71 regression smoke. Artifact:
  `artifacts/c18z72-service-channel-pool-policy-smoke-result.json`.
- C18Z73 pool-policy remediation guard and telemetry is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.247-c18z73` with node-agent image
  `rap-node-agent:0.2.247-c18z73` on `test-1/2/3`; web-admin is rebuilt and
  deployed. Active access telemetry now projects the signed
  `pool_policy_fingerprint`, remediation guard status/reason, and guarded
  remediation commands. Backend remediation rejects an alternate route outside
  the signed entry/exit lease pools and emits `rebuild_route` instead of
  `prefer_alternate_route`; node-agent defensively ignores guarded rejected
  remediation commands before route-manager application. Web-admin shows guard
  chips in access telemetry and node synthetic-config remediation rows.
  Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  web-admin `npm run build`, C18Z73 live smoke, C18Z72 regression smoke, and
  C18Z71/C18Z67 live regression smoke. Artifacts:
  `artifacts/c18z73-service-channel-pool-policy-remediation-guard-smoke-result.json`,
  `artifacts/c18z72-service-channel-pool-policy-smoke-result.json`,
  `artifacts/c18z71-service-channel-adaptive-policy-smoke-result.json`, and
  `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
- C18Z74 service-channel remediation execution visibility is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.248-c18z74` with node-agent image
  `rap-node-agent:0.2.248-c18z74` on `test-1/2/3`; web-admin is rebuilt and
  deployed. Active access telemetry now computes
  `remediation_execution_status`, reason, generation, and observed timestamp
  by correlating active remediation commands with the entry node's latest
  route-manager heartbeat. `prefer_alternate_route` commands show
  `waiting_node_apply` until the node reports a matching route-manager decision
  and then `applied`; guarded commands show `rejected_by_policy_guard`; bounded
  `rebuild_route` commands show `pending_rebuild_request`. The execution state
  is copied into the machine-readable remediation command and displayed in
  web-admin access telemetry / node synthetic remediation rows. Verification
  passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  web-admin `npm run build`, C18Z74 live smoke, C18Z73 regression smoke, and
  C18Z72 regression smoke. Artifacts:
  `artifacts/c18z74-service-channel-remediation-execution-smoke-result.json`,
  `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`,
  `artifacts/c18z73-service-channel-pool-policy-remediation-guard-smoke-result.json`,
  and `artifacts/c18z72-service-channel-pool-policy-smoke-result.json`.
- C18Z75 durable remediation rebuild intent foundation is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.249-c18z75`; node-agent remains
  `rap-node-agent:0.2.248-c18z74` on `test-1/2/3`. When a node fetches
  synthetic config containing a `rebuild_route` remediation command, backend
  now records a durable row in the existing
  `fabric_service_channel_route_rebuild_attempts` ledger with
  `rebuild_status=requested` / `outcome=rebuild_requested`, or
  `rebuild_status=rejected` / `outcome=policy_guard_rejected` when the pool
  policy guard rejects it. Access telemetry correlates that ledger row back to
  the active channel and reports `rebuild_request_recorded` or
  `rebuild_request_rejected` in `remediation_execution_status`. The C18Z75
  smoke isolates a route pair, proves `rebuild_route`, fetches synthetic
  config to persist the intent, verifies the rebuild ledger row, and verifies
  access telemetry reports the recorded execution state. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  web-admin `npm run build`, C18Z75 live smoke, C18Z73 regression smoke, and
  C18Z72 regression smoke. Artifacts:
  `artifacts/c18z75-service-channel-rebuild-intent-smoke-result.json`,
  `artifacts/c18z73-service-channel-pool-policy-remediation-guard-smoke-result.json`,
  and `artifacts/c18z72-service-channel-pool-policy-smoke-result.json`.
- C18Z76 service-channel rebuild-route node acknowledgement is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.250-c18z76` with node-agent image
  `rap-node-agent:0.2.250-c18z76` on `test-1/2/3`. Node-agent now consumes
  allowed `rebuild_route` remediation commands as route-manager decisions with
  `rebuild_status=pending_degraded_fallback` and
  `decision_source=service_channel_remediation_command`; guarded commands are
  still ignored. Backend access telemetry correlates this route-manager
  acknowledgement with the durable ledger intent and reports
  `rebuild_request_recorded_node_pending`. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  C18Z76 live smoke, C18Z75 regression smoke, and C18Z74/C18Z67 regression
  smoke. Artifacts:
  `artifacts/c18z76-service-channel-rebuild-node-pending-smoke-result.json`,
  `artifacts/c18z75-service-channel-rebuild-intent-smoke-result.json`,
  `artifacts/c18z74-service-channel-remediation-execution-smoke-result.json`,
  and `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
- C18Z77 service-channel rebuild planner resolution is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.251-c18z77` with node-agent image
  `rap-node-agent:0.2.251-c18z77` on `test-1/2/3`. Backend now resolves
  durable `rebuild_route` remediation requests during node-scoped synthetic
  config generation: it keeps lease pool-policy guardrails, records
  `applied` / `replacement_selected` when a signed-pool-valid alternate route
  exists, records `no_alternate` when no safe alternate exists, records
  `deferred_by_policy` when the active lease cannot authorize the replacement,
  and records `expired` for stale commands. When a replacement is applied, the
  same command id is projected as a route-manager decision so node-agent can
  consume the resolved planner decision without duplicating the raw command.
  Access telemetry reports planner states such as `rebuild_request_applied`
  and `rebuild_request_no_alternate`. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  C18Z77 live smoke, C18Z75 regression smoke, and C18Z74/C18Z67 regression
  smoke. Artifacts:
  `artifacts/c18z77-service-channel-rebuild-planner-resolution-smoke-result.json`,
  `artifacts/c18z75-service-channel-rebuild-intent-smoke-result.json`,
  `artifacts/c18z74-service-channel-remediation-execution-smoke-result.json`,
  and `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
- C18Z78 service-channel rebuild planner applied-branch visibility is
  implemented and deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.252-c18z78` with node-agent image
  `rap-node-agent:0.2.252-c18z78` on `test-1/2/3`; web-admin is rebuilt and
  deployed to `rap_web_admin`. The admin access-telemetry execution column and
  node synthetic remediation rows now render planner outcomes with explicit
  labels and tones: `rebuild_request_applied` is good,
  `rebuild_request_recorded(_node_pending)`, `rebuild_request_no_alternate`,
  and `rebuild_request_deferred_by_policy` are warning states, while rejected
  or expired requests are bad states. The C18Z78 live smoke proves the applied
  planner branch: a primary route is leased first, the primary route is then
  degraded, an alternate route is added after the lease, synthetic config
  fetch resolves the existing `rebuild_route` command to `applied` /
  `replacement_selected`, and access telemetry reports
  `rebuild_request_applied`. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  web-admin `npm run build`, C18Z78 live smoke, C18Z77 regression smoke, and
  C18Z74/C18Z67 regression smoke. Artifacts:
  `artifacts/c18z78-service-channel-rebuild-planner-applied-smoke-result.json`,
  `artifacts/c18z77-service-channel-rebuild-planner-resolution-smoke-result.json`,
  `artifacts/c18z74-service-channel-remediation-execution-smoke-result.json`,
  and `artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
- C18Z79 service-channel planner-to-runtime loop proof is implemented and
  deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.253-c18z79` with node-agent image
  `rap-node-agent:0.2.253-c18z79` on `test-1/2/3`. The new live smoke extends
  the C18Z78 applied branch: after planner resolves the existing
  `rebuild_route` command to `applied` / `replacement_selected`, the entry node
  reports a route-manager decision for the same `rebuild_request_id`, reports
  transition `applied_rebuild`, and live service-channel packet ingress selects
  the replacement route with no local/backend fallback, route failures, or flow
  drops. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  C18Z79 live smoke, C18Z78 and C18Z77 sequential regressions, and C18Z67
  concurrent QoS regression. Artifact:
  `artifacts/c18z79-service-channel-planner-runtime-loop-smoke-result.json`.
- C18Z80 service-channel sustained post-rebuild pressure proof is implemented
  and deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.254-c18z80` with node-agent image
  `rap-node-agent:0.2.254-c18z80` on `test-1/2/3`. The new live smoke keeps the
  C18Z79 planner-applied loop, then sends five post-rebuild bursts of mixed
  `interactive`, `bulk`, and `reliable` VPN packet batches. It proves every
  burst is accepted by the service-channel runtime, every burst reports the
  replacement route, the stale primary is not reselected, and fallback,
  route-failure, flow-drop, and scheduler-drop deltas stay zero from the
  pre-pressure baseline. Smoke route hygiene was tightened: C18Z67 now disables
  pre-existing active `vpn_packets` intents for its entry/exit pair, and
  C18Z79/C18Z80 expire their temporary primary/alternate intents after a
  successful run. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  C18Z80 live smoke, C18Z79 regression smoke, and C18Z67 concurrent QoS
  regression. Artifact:
  `artifacts/c18z80-service-channel-post-rebuild-pressure-smoke-result.json`.
- C18Z81 service-channel replacement-degradation recovery proof is implemented
  and deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.255-c18z81` with node-agent image
  `rap-node-agent:0.2.255-c18z81` on `test-1/2/3`. The new live smoke proves
  the negative branch after C18Z80: once the initial replacement is applied and
  used, a generation-valid fenced feedback report for that replacement causes
  the Control Plane to select a new safe recovery route. Live traffic then
  moves to the recovery route, the degraded replacement is not reselected, and
  fallback, route-failure, flow-drop, and scheduler-drop deltas stay zero for
  the recovery send. The smoke also documents an important guardrail: stale
  route-generation feedback must not trigger recovery. C18Z67/C18Z79 were
  tightened to check per-run counter deltas rather than cumulative runtime
  counters. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  C18Z81 live smoke, C18Z80 regression smoke, C18Z79 regression smoke, and
  C18Z67 concurrent QoS regression. Artifact:
  `artifacts/c18z81-service-channel-replacement-degradation-recovery-smoke-result.json`.
- C18Z82 service-channel no-safe-recovery proof is implemented and deployed on
  docker-test as `rap-backend:fabric-service-channel-0.2.256-c18z82` with
  node-agent image `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`. The new
  live smoke proves the branch where the original primary is degraded, the
  replacement is applied and used, then that replacement reports
  generation-valid fenced feedback while no new safe recovery route exists.
  Node-scoped synthetic config reports
  `service_channel_feedback_no_alternate` with
  `pending_degraded_fallback`; score reasons include
  `no_unfenced_alternate_route` and
  `backend_relay_degraded_fallback_until_rebuild`, so the Control Plane exposes
  an explicit degraded/no-alternate state instead of silently sticking to a bad
  replacement. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  C18Z82 live smoke, C18Z81 recovery regression, C18Z80 pressure regression,
  and C18Z67 concurrent QoS regression. Artifact:
  `artifacts/c18z82-service-channel-no-safe-recovery-smoke-result.json`.
- C18Z83 service-channel access-telemetry no-safe projection is implemented and
  deployed on docker-test as `rap-backend:fabric-service-channel-0.2.257-c18z83`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Active access telemetry
  channels now expose route-decision source, route id, replacement route id,
  rebuild status/reason/generation, and score reasons. Web-admin shows a
  dedicated `decision` column in the active-channel table. The live smoke
  proves no-safe recovery is visible through access telemetry as
  `service_channel_feedback_no_alternate` /
  `pending_degraded_fallback`, while durable ledger state can still report
  `rebuild_request_no_alternate`. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, and C18Z83 live smoke. Artifact:
  `artifacts/c18z83-service-channel-access-telemetry-no-safe-smoke-result.json`.
- C18Z84 service-channel access-decision aggregate proof is implemented and
  deployed on docker-test as `rap-backend:fabric-service-channel-0.2.258-c18z84`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Access telemetry now
  exposes aggregate route-decision counters:
  `route_decision_channel_count`, `replacement_decision_count`,
  `applied_rebuild_decision_count`, `recovery_decision_count`, and
  `no_safe_recovery_decision_count`. Web-admin summary chips show these counts,
  and no-safe route decisions now prioritize the aggregate reason
  `active_channels_no_safe_recovery` over generic missing access-report noise.
  Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z84 live smoke, and C18Z83 regression smoke.
  Artifact:
  `artifacts/c18z84-service-channel-access-decision-aggregate-smoke-result.json`.
- C18Z85 service-channel access-decision incident projection is implemented and
  deployed on docker-test as `rap-backend:fabric-service-channel-0.2.259-c18z85`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Rebuild health summary now
  carries access decision counts and prioritizes
  `inspect_access_no_safe_recovery_route_pool_and_signed_policy` when no-safe
  is active. Rebuild incidents now include `incident_source=access_decision`
  entries with channel id and operator-facing severity/action, including
  `access_no_safe_recovery` as a bad incident. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z85 live smoke, and C18Z84 regression smoke.
  Artifact:
  `artifacts/c18z85-service-channel-access-decision-incident-smoke-result.json`.
- C18Z86 service-channel access-decision silence/acknowledgement is
  implemented and deployed on docker-test as
  `rap-backend:fabric-service-channel-0.2.261-c18z86`; node-agent remains
  `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and web-admin is
  rebuilt/deployed to `rap_web_admin`. Rebuild alert silence requests now carry
  `incident_source` and `channel_id`; `incident_source=access_decision`
  no-safe incidents require `channel_id` and are stored with channel-scoped
  route keys. Rebuild health and incident lists apply those silences, so an
  acknowledged current-generation access no-safe incident is silenced and no
  longer contributes to active bad count. Generation-change resurfacing is
  covered in unit tests; live smoke proves the channel-scoped silence path.
  Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z86 live smoke, and C18Z85 regression smoke.
  Artifact:
  `artifacts/c18z86-service-channel-access-decision-silence-smoke-result.json`.
- C18Z87 service-channel access-decision silence management is implemented and
  deployed on docker-test as `rap-backend:fabric-service-channel-0.2.262-c18z87`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Backend now exposes active
  rebuild alert silences, enriches access-decision silences with
  `incident_source`, `channel_id`, and `display_route_id`, and supports
  unsilence by id. Web-admin shows an `Active rebuild silences` table with an
  `unsilence` action. The live smoke proves the operator path:
  access no-safe incident -> silence -> active silence listed -> unsilence ->
  active bad incident restored. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z87 live smoke, and C18Z86 regression smoke.
  Artifact:
  `artifacts/c18z87-service-channel-access-decision-unsilence-smoke-result.json`.
- C18Z88 service-channel access-decision resurface proof is implemented and
  deployed on docker-test as `rap-backend:fabric-service-channel-0.2.263-c18z88`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Access-decision incidents
  now include resurface details (`alert_resurfaced_from_silence_id`,
  `alert_resurfaced_previous_generation`, and
  `alert_resurfaced_previous_until`) when a previously acknowledged
  access-decision incident changes generation/route/channel and becomes active
  again. Web-admin shows the previous generation/expiry beside resurfaced
  incidents. The live smoke proves access no-safe -> silence current generation
  -> route-decision generation changes -> incident resurfaces as active bad
  with previous-generation metadata preserved. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z88 live smoke, and C18Z87 regression smoke.
  Artifact:
  `artifacts/c18z88-service-channel-access-decision-resurface-smoke-result.json`.
- C18Z89 service-channel access-decision resurface action loop is implemented
  and deployed on docker-test as `rap-backend:fabric-service-channel-0.2.264-c18z89`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Resurfaced
  access-decision incidents now include `alert_resurfaced_cause`,
  `alert_resurfaced_previous_route_id`, and
  `alert_resurfaced_previous_channel_id`. Web-admin shows the cause beside the
  resurfaced action text. The live smoke proves the operator path:
  access no-safe -> silence current generation -> generation changes and
  resurfaces -> active-channel decision context matches the incident ->
  re-acknowledge current generation -> incident returns to silenced state.
  Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z89 live smoke, and C18Z88 regression smoke.
  Artifact:
  `artifacts/c18z89-service-channel-access-decision-resurface-action-smoke-result.json`.
- C18Z90 service-channel production data-plane contract is implemented and
  deployed on docker-test as `rap-backend:fabric-service-channel-0.2.265-c18z90`;
  node-agent remains `rap-node-agent:0.2.256-c18z82` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Service-channel leases now
  include a signed `data_plane` contract in the lease, authority payload,
  introspection response, and lease-maintenance/admin list. The contract
  declares backend API as control-plane transport, fabric service channel over
  fabric routes as working/steady-state data transport, backend relay as
  degraded fallback only, production forwarding required, and service-neutral
  protocol-agnostic logical flow isolation. Web-admin shows data-plane/fallback
  policy in service-channel leases. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z90 live smoke, and C18Z89 regression smoke.
  Artifact:
  `artifacts/c18z90-service-channel-data-plane-contract-smoke-result.json`.
- C18Z91 node-agent data-plane contract consumption is implemented and
  deployed on docker-test as `rap-node-agent:0.2.266-c18z91` on `test-1/2/3`
  with backend still `rap-backend:fabric-service-channel-0.2.265-c18z90`.
  Service-channel VPN packet ingress now parses signed/introspected
  `data_plane`, validates the production contract, applies the preferred fabric
  route, logs data-plane mode/transports/backend-relay policy/logical-flow
  mode, and reports `data_plane_contract` plus last transport/policy fields in
  heartbeat access telemetry. Verification passed:
  `go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
  backend cluster tests, web-admin build, C18Z91 live smoke, and C18Z90
  regression smoke. Artifact:
  `artifacts/c18z91-node-agent-data-plane-contract-enforcement-smoke-result.json`.
- C18Z92 node-agent backend-fallback policy enforcement is implemented and
  deployed on docker-test as `rap-node-agent:0.2.267-c18z92` on `test-1/2/3`.
  If a signed data-plane contract has `backend_relay_policy=disabled`, the
  service-channel runtime no longer proxies failed/missing fabric-route working
  data through backend relay; it returns a visible service unavailable result.
  The live smoke temporarily disables backend fallback in pool policy, issues a
  no-route lease, verifies `backend_relay_policy=disabled`, posts to test-1,
  and proves the node rejects with 503 instead of backend relay. Verification
  passed: node-agent tests, C18Z92 live smoke, and C18Z91 regression smoke.
  Artifact:
  `artifacts/c18z92-node-agent-disabled-backend-fallback-smoke-result.json`.
- C18Z93 access-telemetry data-plane projection is implemented and deployed on
  docker-test as `rap-backend:fabric-service-channel-0.2.268-c18z93`;
  node-agent remains `rap-node-agent:0.2.267-c18z92` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Backend access telemetry
  now promotes node-reported `data_plane_contract` and last data-plane
  mode/working transport/steady-state transport/backend relay policy/logical
  flow mode to cluster, node, and active-channel diagnostics. Web-admin shows
  summary chips plus channel/node table columns for data-plane adoption and
  relay policy. Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z93 live smoke, C18Z92 regression smoke, and
  C18Z91 regression smoke. Artifact:
  `artifacts/c18z93-access-telemetry-data-plane-contract-smoke-result.json`.
- C18Z94 data-plane contract incident diagnostics are implemented and deployed
  on docker-test as `rap-backend:fabric-service-channel-0.2.269-c18z94`;
  node-agent remains `rap-node-agent:0.2.267-c18z92` on `test-1/2/3`, and
  web-admin is rebuilt/deployed to `rap_web_admin`. Access/rebuild incident
  diagnostics now include `incident_source=data_plane_contract` rows for
  missing data-plane contract reports after accepted traffic, working/steady
  transport mismatches, logical-flow mismatch, disabled backend relay observed,
  and degraded/backend-relay policy violations. The smoke now proves disabled
  backend relay is emitted as a bad incident with action
  `restore_fabric_route_or_change_signed_backend_relay_policy_before_retry`.
  Verification passed:
  `go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
  web-admin `npm run build`, C18Z94 live smoke, C18Z93 regression smoke, C18Z92
  regression smoke, and C18Z91 regression smoke. Artifact:
  `artifacts/c18z94-data-plane-contract-incident-smoke-result.json`.
- C18Z95 node-agent blocked-fallback telemetry is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.270-c18z95`
  and node-agent `rap-node-agent:0.2.270-c18z95` on `test-1/2/3`; web-admin is
  rebuilt/deployed to `rap_web_admin`. Node-agent now reports
  `backend_fallback_blocked`, `fabric_route_send_failure`, and last data-plane
  violation status/reason in `fabric_service_channel_access_report`. Backend
  access telemetry projects those fields to cluster, node, and active-channel
  rows, and `data_plane_contract` incidents distinguish policy-blocked fallback
  from real backend relay usage. Verification passed: node-agent tests,
  backend tests, web-admin build, C18Z95 live smoke, and C18Z94/C18Z93/C18Z92
  regressions. Artifact:
  `artifacts/c18z95-node-agent-blocked-fallback-telemetry-smoke-result.json`.
- C18Z96 blocked-fallback rebuild feedback is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  node-agent remains `rap-node-agent:0.2.270-c18z95` on `test-1/2/3`, and
  web-admin remains deployed. Backend now converts heartbeat access reports
  with `fabric_route_send_failed_backend_fallback_blocked` into durable fenced
  `fabric_service_channel_route_feedback` for the active channel primary route.
  The existing route rebuild planner then selects an authorized replacement
  route when one exists. Verification passed: backend tests, node-agent tests,
  web-admin build, C18Z96 live smoke, and C18Z95/C18Z93 regressions. Artifact:
  `artifacts/c18z96-blocked-fallback-rebuild-feedback-smoke-result.json`.
- C18Z97 blocked-fallback feedback dedup is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`.
  Backend now suppresses repeated access-report-derived route feedback while an
  active fenced/degraded observation from `fabric_service_channel_access_report`
  already exists for the same cluster, reporter node, route, and service class.
  This keeps repeated blocked-fallback send-failure heartbeats from refreshing
  the same feedback and churning rebuild attempts. Verification passed:
  backend tests, node-agent tests, C18Z97 live smoke, and C18Z96/C18Z95
  regressions. Artifact:
  `artifacts/c18z97-blocked-fallback-feedback-dedup-smoke-result.json`.
- C18Z98 blocked-fallback rebuild correlation is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  web-admin is rebuilt/deployed to `rap_web_admin`. Backend now carries the
  originating access-report route-feedback identity into replacement decisions
  and rebuild-attempt ledger rows: `feedback_observation_id`,
  `feedback_source`, feedback observed/expiry times, channel/resource ids, and
  data-plane violation status/reason. Web-admin shows this correlation in
  Route decisions and Rebuild ledger. Verification passed: backend tests,
  node-agent tests, web-admin build, C18Z98 live smoke, and C18Z97/C18Z96/C18Z95
  regressions. Artifact:
  `artifacts/c18z98-blocked-fallback-rebuild-correlation-smoke-result.json`.
- C18Z99 rebuild correlation filters are implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  web-admin is rebuilt/deployed to `rap_web_admin`. The rebuild-attempt ledger
  API now accepts `feedback_source`, `feedback_channel_id`, and
  `feedback_violation_status` filters, and web-admin exposes them in the
  rebuild ledger filter form. Verification passed: backend tests, node-agent
  tests, web-admin build, C18Z99 live smoke, and C18Z98/C18Z97/C18Z96/C18Z95/
  C18Z93 regressions. Artifact:
  `artifacts/c18z99-rebuild-correlation-filter-smoke-result.json`.
- C18Z100 rebuild-health feedback breakdown is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  web-admin is rebuilt/deployed to `rap_web_admin`. The rebuild-health summary
  now returns `feedback_breakdowns` grouped by feedback source, feedback
  channel id, and feedback violation status, with total/good/warn/bad/unknown
  counts, active warn/bad counts, silenced count, latest observation time, and
  affected reporter nodes/routes. Web-admin shows the breakdown in the Rebuild
  health panel. Verification passed: backend tests, node-agent tests,
  web-admin build, C18Z100 live smoke, and C18Z99/C18Z98/C18Z97/C18Z96/C18Z95/
  C18Z93 regressions. Artifact:
  `artifacts/c18z100-rebuild-health-feedback-breakdown-smoke-result.json`.
- C18Z101 rebuild-health feedback drilldown UI is implemented and deployed to
  `rap_web_admin`; backend remains
  `rap-backend:fabric-service-channel-0.2.281-c18z109`. Web-admin now shows
  related incident context on rebuild-health feedback breakdown rows and an
  `open ledger` action that switches to deep rebuild ledger with
  `feedback_source`, `feedback_channel_id`, and `feedback_violation_status`
  prefilled from the selected breakdown. Verification passed: web-admin build
  and deployed asset/download checks.
- C18Z102 rebuild-health feedback drilldown audit breadcrumbs are implemented
  and deployed on docker-test as backend
  `rap-backend:fabric-service-channel-0.2.281-c18z109`; web-admin is rebuilt/
  deployed to `rap_web_admin`. The existing rebuild investigation endpoint now
  accepts feedback source/channel/violation drilldown payloads and records
  `fabric.service_channel_rebuild_feedback_breakdown.investigation_opened`
  cluster audit events before web-admin opens the filtered deep rebuild ledger.
  Verification passed: backend tests, web-admin build, C18Z102 live smoke, and
  C18Z100/C18Z99/C18Z98 regressions. Artifact:
  `artifacts/c18z102-rebuild-health-feedback-drilldown-audit-smoke-result.json`.
- C18Z103 Fabric diagnostics drilldown audit visibility is implemented and
  deployed to `rap_web_admin`; backend remains
  `rap-backend:fabric-service-channel-0.2.281-c18z109`. Web-admin now filters
  the loaded cluster audit list for rebuild incident and feedback-breakdown
  investigation events and shows recent drilldowns in the Fabric diagnostics
  panel with time, source, feedback filters, target reporter/route, actor, and
  reason. Verification passed: web-admin build and deployed asset/download
  checks.
- C18Z104 focused Fabric audit loading is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  web-admin is rebuilt/deployed to `rap_web_admin`. The cluster audit API now
  accepts repeated or comma-separated `event_type` filters plus `target_type`
  filters, and Fabric diagnostics loads recent rebuild incident/feedback
  breakdown investigation breadcrumbs with a dedicated filtered request instead
  of depending on the generic latest-100 audit list. Verification passed:
  backend tests, web-admin build, C18Z104 live smoke, and C18Z102/C18Z100
  regressions. Artifact:
  `artifacts/c18z104-focused-fabric-audit-smoke-result.json`.
- C18Z105 Fabric drilldown breadcrumb correlation UI is implemented and
  deployed to `rap_web_admin`; backend remains
  `rap-backend:fabric-service-channel-0.2.281-c18z109`. Recent investigation
  rows in Fabric diagnostics now show whether each breadcrumb still matches a
  current rebuild-health feedback breakdown or visible rebuild incident, and
  provide an `open` action to jump back into the matching filtered ledger path.
  Verification passed: web-admin build and deployed asset/download checks.
- C18Z106 server-side Fabric drilldown breadcrumb correlation is implemented
  and deployed on docker-test as backend
  `rap-backend:fabric-service-channel-0.2.281-c18z109`; web-admin is rebuilt/
  deployed to `rap_web_admin`. Focused audit reads with
  `correlation=fabric_diagnostics` now return `correlation_hints` with current
  diagnostic status and matching rebuild-health feedback breakdown or rebuild
  incident when present. Web-admin consumes those hints and keeps local matching
  as fallback. The rebuild-health feedback breakdown window is raised to 100
  groups after C18Z100 regression exposed the previous cap could hide fresh
  failure classes on noisy test history. Verification passed: backend tests,
  web-admin build, C18Z106 live smoke, and C18Z104/C18Z100 regressions.
  Artifact: `artifacts/c18z106-audit-correlation-hints-smoke-result.json`.
- C18Z107 drilldown breadcrumb summary is implemented and deployed on
  docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  web-admin is rebuilt/deployed to `rap_web_admin`. Audit responses now include
  compact `audit_summary` aggregates beside `audit_events`; focused Fabric
  diagnostics uses them to show counts by current diagnostic status, feedback
  source, feedback violation status, correlated/not-visible totals, and latest
  time above the Recent investigations rows. Verification passed: backend
  tests, web-admin build, C18Z107 live smoke, and C18Z106/C18Z104 regressions.
  Artifact: `artifacts/c18z107-audit-correlation-summary-smoke-result.json`.
- C18Z108 dedicated Fabric diagnostics breadcrumbs are implemented and deployed
  on docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
  web-admin is rebuilt/deployed to `rap_web_admin`. Backend exposes
  `GET /clusters/{clusterID}/fabric/service-channels/rebuild-investigations/breadcrumbs`
  returning `rebuild_investigation_breadcrumbs` with events and summary, so the
  operator Recent investigations workflow no longer overloads the generic
  cluster audit endpoint. Verification passed: backend tests, web-admin build,
  C18Z108 live smoke, and C18Z107/C18Z106/C18Z100 regressions. Artifact:
  `artifacts/c18z108-dedicated-breadcrumbs-smoke-result.json`.
- C18Z109 Fabric diagnostics breadcrumb freshness windows are implemented and
  deployed on docker-test as backend
  `rap-backend:fabric-service-channel-0.2.281-c18z109`; web-admin is rebuilt/
  deployed to `rap_web_admin`. The dedicated breadcrumb endpoint accepts
  `current_window_seconds` and `history_window_seconds`, annotates events with
  `correlation_hints.breadcrumb_status` (`current`, `stale`, `expired`) plus
  age/window seconds, returns current/stale/expired totals, and includes
  `counts_by_breadcrumb_status` in summary. Web-admin shows freshness chips and
  age in Recent investigations. Verification passed: backend tests, web-admin
  build, C18Z109 live smoke, and C18Z108/C18Z107/C18Z106 regressions. Artifact:
  `artifacts/c18z109-breadcrumb-freshness-window-smoke-result.json`.
- C19Q Remote Workspace mailbox guardrails are implemented and
  runtime-smoke-proven on docker-test. The adapter-session mailbox handoff now
  has unit and live coverage for invalid adapter session IDs, unknown sessions,
  invalid limits, and bounded `drain=true&limit=N` partial drain semantics.
  This remains probe-only and node-local: it does not enable RDP protocol
  forwarding, desktop frame transport, Android work, or backend relay behavior.
  Verification passed: `go test ./internal/mesh` in `agents/rap-node-agent` and
  `scripts/fabric/c19q-remote-workspace-adapter-mailbox-guardrails-smoke.ps1`.
  Artifact:
  `artifacts/c19q-remote-workspace-adapter-mailbox-guardrails-smoke-result.json`.
- C19R Remote Workspace mailbox long-poll ergonomics are implemented and
  runtime-smoke-proven on docker-test. The mailbox endpoint now accepts bounded
  `wait_ms`, returns explicit `empty`, `waited`, `wait_timeout`, and `wait_ms`
  fields, and wakes when a delayed mailbox event arrives before timeout.
  Node-agent image `rap-node-agent:codex-service-supervisor-20260512s` is built
  and deployed on `test-1/2/3`. Verification passed:
  `go test ./internal/mesh`, C19R live smoke, and C19Q regression smoke.
  Artifact:
  `artifacts/c19r-remote-workspace-mailbox-long-poll-smoke-result.json`.
- C19S Remote Workspace mailbox telemetry is implemented and
  runtime-smoke-proven on docker-test. Workload status and heartbeat telemetry
  now expose mailbox read/wait/timeout/empty-read counters plus last mailbox
  read metadata, so adapter consumer polling behavior is visible without
  enabling desktop frame transport. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512t` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19S live
  smoke, and C19R regression smoke. Artifact:
  `artifacts/c19s-remote-workspace-mailbox-telemetry-smoke-result.json`.
- C19T Remote Workspace mailbox consumer checkpoint/ack metadata is implemented
  and runtime-smoke-proven on docker-test. The mailbox endpoint now accepts a
  validated `consumer_id` and optional `ack_sequence`, returns consumer
  checkpoint/ack/lag/read metadata, and keeps bounded per-session node-local
  consumer cursor state. Workload status and heartbeat telemetry expose
  aggregate/current-session consumer read and ack counters. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512u` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19T live
  smoke, and C19S regression smoke. Artifact:
  `artifacts/c19t-remote-workspace-mailbox-consumer-checkpoint-smoke-result.json`.
- C19U Remote Workspace mailbox consumer lifecycle guardrails are implemented
  and runtime-smoke-proven on docker-test. Consumers can pass
  `reset_consumer=true` with a validated `consumer_id` to clear cursor state
  before the current read is recorded. Mailbox responses expose consumer
  count/capacity, created/reset/evicted lifecycle flags, and consumer
  timestamps; workload status and heartbeat telemetry expose consumer reset and
  eviction counters. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512v` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19U live
  smoke, and C19T regression smoke. Artifact:
  `artifacts/c19u-remote-workspace-mailbox-consumer-lifecycle-smoke-result.json`.
- C19V Remote Workspace mailbox consumer cursor inspection is implemented and
  runtime-smoke-proven on docker-test. Active adapter sessions now expose a
  read-only
  `/mesh/v1/remote-workspace/adapter-sessions/{adapter_session_id}/mailbox/consumers`
  endpoint with bounded cursor snapshots: consumer ids, checkpoint/ack
  sequences, lag, read/ack totals, and timestamps. The endpoint is read-only and
  does not increment mailbox reads, acks, resets, or drain events. Node-agent
  image `rap-node-agent:codex-service-supervisor-20260512w` is built and
  deployed on `test-1/2/3`. Verification passed: `go test ./internal/mesh`,
  C19V live smoke, and C19U regression smoke. Artifact:
  `artifacts/c19v-remote-workspace-mailbox-consumer-snapshot-smoke-result.json`.
- C19W Remote Workspace mailbox cursor-aware resume reads are implemented and
  runtime-smoke-proven on docker-test. The mailbox endpoint now accepts
  `after_sequence` for non-destructive reads, returns `skipped_count` and
  `returned_count`, and long-polls for events newer than the requested sequence.
  `after_sequence` with `drain=true` is rejected to keep resume reads separate
  from destructive drains. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512x` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19W live
  smoke, and C19V regression smoke. Artifact:
  `artifacts/c19w-remote-workspace-mailbox-after-sequence-smoke-result.json`.
- C19X Remote Workspace mailbox consumer-aware resume is implemented and
  runtime-smoke-proven on docker-test. Mailbox reads with `consumer_id` can pass
  `resume_from=ack|checkpoint`; the node-agent resolves the stored cursor to
  `after_sequence` before reading and returns `resume_from`/`resume_sequence`.
  Guardrails reject mixing resume with manual `after_sequence`, drain, reset,
  missing consumers, or invalid cursor names. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512y` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19X live
  smoke, and C19W regression smoke. Artifact:
  `artifacts/c19x-remote-workspace-mailbox-consumer-resume-smoke-result.json`.
- C19Y Remote Workspace mailbox resume telemetry is implemented and
  runtime-smoke-proven on docker-test. Workload status and heartbeat telemetry
  now expose resume/after-sequence read totals, returned/skipped totals, and the
  last resume cursor/sequence/consumer plus returned/skipped counts for
  operator diagnostics. Session snapshots include the same per-session resume
  counters. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512z` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19Y live
  smoke, C19X source smoke, and C19W regression smoke. Artifact:
  `artifacts/c19y-remote-workspace-mailbox-resume-telemetry-smoke-result.json`.
- C19Z Remote Workspace adapter runtime readiness summary is implemented and
  runtime-smoke-proven on docker-test. The sink report now includes compact
  `adapter_runtime_readiness` diagnostics with session lifecycle state, mailbox
  depth, consumer cursor, resume cursor, skipped/returned counts, and
  ready/diagnostic status for operator handoff checks. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512z1` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19Z live
  smoke, C19X source smoke, and C19Y regression smoke. Artifact:
  `artifacts/c19z-remote-workspace-adapter-readiness-smoke-result.json`.
- C19Z1 Remote Workspace mailbox handoff preflight is implemented and
  runtime-smoke-proven on docker-test. The node-agent now exposes read-only
  `GET /mesh/v1/remote-workspace/adapter-sessions/{adapter_session_id}/mailbox/preflight`
  for `consumer_id` plus `resume_from=ack|checkpoint`; it validates the cursor
  and reports the expected next event window without reading, draining, acking,
  or mutating consumer state. Node-agent image
  `rap-node-agent:codex-service-supervisor-20260512z2` is built and deployed on
  `test-1/2/3`. Verification passed: `go test ./internal/mesh`, C19Z1 live
  smoke, C19X source smoke, and C19Z regression smoke. Artifact:
  `artifacts/c19z1-remote-workspace-mailbox-preflight-smoke-result.json`.

The current phase is NOT:
- full mesh routing implementation
- full VPN orchestration
- multi-cluster runtime traffic handling
- production data-plane migration
- complete updater rollout orchestration
- video meetings
- final native client UI redesign

Future mesh, VPN, multi-cluster, node-agent updater, and production realtime data-plane work must be introduced only through explicit, narrow, staged implementation prompts.

Always keep the project production-oriented. Do not simplify it into a toy app.