1150 lines
46 KiB
Markdown
1150 lines
46 KiB
Markdown
# Mesh Routing Runtime Implementation Plan
|
|
|
|
Status: Stage C17 planning completed. Stage C17A synthetic mesh runtime
|
|
skeleton, Stage C17B route health/failover probes, Stage C17C relay semantic
|
|
hardening, Stage C17D non-production test-service path experiment, Stage C17E
|
|
historical live node-to-node synthetic HTTP transport skeleton, Stage C17F scoped
|
|
synthetic route config boundary, Stage C17G Control Plane scoped synthetic
|
|
config read boundary, Stage C17H deployed multi-agent synthetic config smoke,
|
|
Stage C17I production forwarding gate, Stage C17J production envelope
|
|
contract, Stage C17K production envelope observation, and Stage C17L bounded
|
|
production observation sink, and Stage C17M production observation sink wiring
|
|
and Stage C17N production observation sink metrics are implemented and
|
|
test-proven. Stage C17O production observation sink local metrics logging is
|
|
implemented and test-proven. Stage C17P production observation sink
|
|
change-driven metrics logging is implemented and test-proven. Stage C17Q
|
|
production forwarding gate/runtime log boundary is implemented and test-proven.
|
|
Stage C17R production observation sink capacity guard is implemented and
|
|
test-proven. Stage C17S production observation panic fail-closed hardening is
|
|
implemented and test-proven. Stage C17T production envelope payload boundary
|
|
is implemented and test-proven. Stage C17U production envelope created-at skew
|
|
boundary is implemented and test-proven. Stage C17V peer endpoint candidate
|
|
model and NAT/connectivity hints are implemented and test-proven. Stage C17W
|
|
peer endpoint candidate scoring model is implemented and test-proven. Stage
|
|
C17X health-aware endpoint candidate scoring overlay is implemented and
|
|
test-proven. Stage C17Y Platform Owner synthetic mesh visibility is implemented
|
|
and build/test-proven. Stage C17Z production fabric-control direct forwarding
|
|
boundary is implemented and test-proven. Stages C17Z1 through C17Z18 are
|
|
implemented and test/docker-test-runtime-proven through route-config,
|
|
peer-directory, peer-cache, endpoint reporting/candidates, peer connection
|
|
state/recovery/intent/manager, rendezvous/relay lease, stale relay
|
|
replacement, route/path decision, route generation tracker, and synthetic
|
|
route-health effective-path boundaries.
|
|
|
|
This document defines the implementation plan for future mesh routing runtime.
|
|
It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP
|
|
tunnel runtime, relay packet routing, RDP work, or service workload execution.
|
|
|
|
Production mesh runtime implementation is not authorized by this document.
|
|
C17A implemented only synthetic `fabric.probe` / `fabric.probe_ack` execution
|
|
behind a disabled-by-default feature flag. C17B added synthetic
|
|
`fabric.route_health` / `fabric.route_health_ack`, local route observations,
|
|
fallback route selection, warm route promotion metrics, and route-cache
|
|
invalidation. C17C added synthetic relay validation, per-channel bounded
|
|
queues, QoS dequeue order, telemetry-only drop/backpressure, and reliable
|
|
fabric/control rejection behavior. C17D added one bounded `synthetic.echo`
|
|
test-service path over direct, single-relay, and forced fallback routes. C17E
|
|
added one historical real-HTTP peer transport experiment and a
|
|
disabled-by-default node-agent synthetic endpoint/smoke harness for direct and
|
|
single-relay synthetic traffic only. C17F
|
|
added scoped synthetic peer/route config loading and synthetic route-health
|
|
link observation reporting. C17G added the Control Plane read boundary for
|
|
node-scoped synthetic mesh config. C17H proved that boundary in a deployed
|
|
multi-agent `docker-test` smoke. C17I added an explicit production-forwarding
|
|
gate. C17J added route-bound production envelope validation. C17K added
|
|
metadata-only local observation of accepted production envelopes while keeping
|
|
production forwarding unavailable. C17L added bounded local retention of
|
|
accepted metadata-only observations while still keeping production forwarding
|
|
unavailable. C17M added disabled-by-default node-agent wiring for that bounded
|
|
local sink while still keeping production forwarding unavailable. C17N added
|
|
local sink metrics without exposing observation records while still keeping
|
|
production forwarding unavailable. C17O added local aggregate metrics logging
|
|
without read APIs or Control Plane reporting while still keeping production
|
|
forwarding unavailable. C17P suppressed repeated unchanged local metrics logs
|
|
while still keeping production forwarding unavailable. C17Q separated
|
|
production forwarding gate state from runtime state in local logs while still
|
|
keeping production forwarding unavailable. C17R added a maximum local
|
|
observation sink capacity guard while still keeping production forwarding
|
|
unavailable. C17S made observer panic handling fail closed while still keeping
|
|
production forwarding unavailable. C17T added an explicit validated
|
|
fabric-control payload size boundary while still keeping production forwarding
|
|
unavailable. C17U added an explicit validated created-at future-skew boundary
|
|
while still keeping production forwarding unavailable. C17V added route-scoped
|
|
peer endpoint candidates and NAT/connectivity hints to synthetic config while
|
|
still keeping production forwarding unavailable. C17W added deterministic
|
|
local endpoint candidate scoring while still keeping production forwarding
|
|
unavailable. C17X added health-aware local endpoint candidate scoring while
|
|
still keeping production forwarding unavailable. C17Y added Platform Owner
|
|
visibility for node-scoped synthetic mesh config while still keeping
|
|
production forwarding unavailable. C17Z added gate-controlled production
|
|
`fabric.control` local delivery and direct next-hop forwarding while keeping
|
|
service traffic unavailable.
|
|
|
|
## 1. Purpose
|
|
|
|
C17 planning turns the accepted C10-C16 Fabric Core foundation into a safe,
|
|
incremental mesh routing runtime plan.
|
|
|
|
Accepted foundation:
|
|
|
|
- C10: Fabric Core config distribution design
|
|
- C11: signed scoped cluster snapshots
|
|
- C12: node local state store
|
|
- C13: Fabric Storage / Config Storage
|
|
- C14: peer directory/cache
|
|
- C15: Fabric Routing Engine skeleton
|
|
- C16: secure node-to-node channel lifecycle
|
|
|
|
C17 planning defines how runtime work should begin without accidentally
|
|
creating a broad production mesh, a second source of truth, or a hidden service
|
|
transport rewrite.
|
|
|
|
## 2. Hard Non-Goals
|
|
|
|
C17 planning and C17A must not:
|
|
|
|
- carry RDP user traffic
|
|
- carry VPN/IP tunnel traffic
|
|
- carry production service workload traffic
|
|
- replace direct worker WSS
|
|
- remove backend gateway fallback
|
|
- change backend session lifecycle
|
|
- change Windows client behavior
|
|
- expose mesh topology to organizations
|
|
- implement arbitrary relay packet forwarding
|
|
- implement QUIC/WebRTC
|
|
- bypass signed snapshots or node-local policy
|
|
- allow nodes to invent routes without Fabric Routing Engine boundaries
|
|
|
|
## 3. Runtime Principle
|
|
|
|
Mesh runtime must start as a controlled fabric-internal path.
|
|
|
|
Initial runtime traffic should be limited to:
|
|
|
|
- fabric control probes
|
|
- route health probes
|
|
- synthetic test messages
|
|
- safe telemetry
|
|
|
|
Service traffic such as RDP, VNC, SSH, file transfer, video, and VPN/IP tunnel
|
|
must remain outside the first runtime skeleton.
|
|
|
|
## 4. Minimal Runtime Sequence
|
|
|
|
The first implementation should follow this sequence.
|
|
|
|
### C17A: Mesh Runtime Skeleton, Synthetic Traffic Only
|
|
|
|
Status: implemented and test-proven. Report:
|
|
`artifacts/c17a-synthetic-mesh-runtime-skeleton-report.md`.
|
|
|
|
Goal:
|
|
Prove route selection, secure node channels, hop forwarding, TTL, observability,
|
|
and rollback with synthetic fabric messages only.
|
|
|
|
Allowed:
|
|
|
|
- route request/result implementation boundary
|
|
- node-to-node secure channel use from C16
|
|
- direct path for synthetic control message
|
|
- single relay path for synthetic control message
|
|
- TTL / hop limit
|
|
- route id propagation
|
|
- structured logs and metrics
|
|
- kill-switch to disable mesh runtime immediately
|
|
|
|
Not allowed:
|
|
|
|
- RDP traffic
|
|
- VPN/IP tunnel traffic
|
|
- service workload traffic
|
|
- organization-visible topology
|
|
- production data-plane migration
|
|
|
|
### C17B: Route Health and Failover Probes
|
|
|
|
Status: implemented and test-proven. Report:
|
|
`artifacts/c17b-route-health-failover-probes-report.md`.
|
|
|
|
Goal:
|
|
Prove route health observations and failover decisions with synthetic traffic.
|
|
|
|
Allowed:
|
|
|
|
- route health probes
|
|
- warm peer promotion
|
|
- fallback route selection
|
|
- failed route marking
|
|
- route cache invalidation on policy/peer changes
|
|
|
|
Not allowed:
|
|
|
|
- production service traffic
|
|
- tenant-visible routing decisions
|
|
|
|
### C17C: Relay Runtime Hardening
|
|
|
|
Status: implemented and test-proven. Report:
|
|
`artifacts/c17c-relay-semantic-hardening-report.md`.
|
|
|
|
Goal:
|
|
Harden relay forwarding semantics before service traffic.
|
|
|
|
Allowed:
|
|
|
|
- relay envelope validation
|
|
- max hops
|
|
- loop prevention
|
|
- per-channel class queue boundaries
|
|
- QoS scheduling with synthetic channel classes
|
|
- backpressure and drop rules
|
|
|
|
Not allowed:
|
|
|
|
- general-purpose packet relay
|
|
- VPN packet forwarding
|
|
- RDP render/input migration
|
|
|
|
### C17D: Non-Production Service-Path Experiment
|
|
|
|
Status: implemented and test-proven. Report:
|
|
`artifacts/c17d-non-production-test-service-path-report.md`.
|
|
|
|
Goal:
|
|
Optionally test a non-production service flow after C17A-C17C are accepted.
|
|
|
|
Allowed only after explicit approval:
|
|
|
|
- one test service type
|
|
- one test organization
|
|
- one test cluster
|
|
- forced fallback path
|
|
- no production users
|
|
|
|
RDP must remain paused unless separately approved.
|
|
|
|
## 5. Route Execution Boundary
|
|
|
|
Route execution consumes a route result from the Fabric Routing Engine.
|
|
|
|
Route execution may:
|
|
|
|
- open an authorized node-to-node channel
|
|
- send a route-bound envelope
|
|
- forward only if route id, hop id, TTL, and channel class are valid
|
|
- report delivery/failure telemetry
|
|
- update local route cache observations
|
|
|
|
Route execution must not:
|
|
|
|
- choose a route independently
|
|
- override hard policy checks
|
|
- create shortcut connections on its own
|
|
- cross cluster boundaries without explicit trust
|
|
- mutate PostgreSQL authority
|
|
- expose topology to tenants
|
|
|
|
## 6. Mesh Envelope Boundary
|
|
|
|
Initial mesh runtime envelopes should be service-neutral.
|
|
|
|
Required envelope fields:
|
|
|
|
- `fabric_protocol_version`
|
|
- `message_id`
|
|
- `route_id`
|
|
- `cluster_id`
|
|
- `source_node_id`
|
|
- `destination_node_id`
|
|
- `current_hop_node_id`
|
|
- `next_hop_node_id`
|
|
- `channel_class`
|
|
- `message_type`
|
|
- `ttl`
|
|
- `hop_count`
|
|
- `created_at`
|
|
- `expires_at`
|
|
- `payload_length`
|
|
- `payload_hash`
|
|
|
|
Initial allowed message types:
|
|
|
|
- `fabric.probe`
|
|
- `fabric.probe_ack`
|
|
- `fabric.route_health`
|
|
- `fabric.telemetry`
|
|
|
|
Payload must remain small and bounded in C17A.
|
|
|
|
## 7. Relay Forwarding Boundary
|
|
|
|
Relay forwarding in early runtime is not arbitrary packet forwarding.
|
|
|
|
Relay may forward only when:
|
|
|
|
- route id is known and valid
|
|
- current node is the expected hop
|
|
- next hop is authorized
|
|
- channel class is allowed
|
|
- TTL is positive
|
|
- hop count is within limit
|
|
- route has not expired
|
|
- source and destination match the route result
|
|
- partition/degraded policy allows forwarding
|
|
|
|
Relay must reject:
|
|
|
|
- unknown route id
|
|
- wrong cluster
|
|
- wrong organization scope
|
|
- expired route
|
|
- TTL exhausted
|
|
- hop loop detected
|
|
- unauthorized channel class
|
|
- revoked peer
|
|
- stale policy version
|
|
|
|
## 8. Loop Prevention
|
|
|
|
Required loop prevention:
|
|
|
|
- TTL
|
|
- max hop count
|
|
- visited hop set or compact loop token
|
|
- route epoch
|
|
- route id validation
|
|
- duplicate message id cache with TTL
|
|
|
|
Loop detection must fail closed and report telemetry.
|
|
|
|
## 9. Channel Scheduling
|
|
|
|
Even synthetic runtime should model future channel priorities.
|
|
|
|
Priority order:
|
|
|
|
1. `fabric_control`
|
|
2. `input`
|
|
3. `route_control`
|
|
4. `render`
|
|
5. `clipboard`
|
|
6. `file_transfer`
|
|
7. `storage_fetch`
|
|
8. `update_fetch`
|
|
9. `vpn_packet`
|
|
10. `telemetry`
|
|
|
|
C17A should only carry `fabric_control`, `route_control`, and `telemetry`, but
|
|
the scheduler boundary must not block future channel-aware extension.
|
|
|
|
## 10. Route Cache Integration
|
|
|
|
Route execution may update observations:
|
|
|
|
- route success
|
|
- route failure
|
|
- latency
|
|
- delivery time
|
|
- retry count
|
|
- failure reason
|
|
- peer health hint
|
|
|
|
Route execution must not update authoritative policy.
|
|
|
|
Cache invalidation must occur when:
|
|
|
|
- route expires
|
|
- policy version changes
|
|
- peer directory version changes
|
|
- trust/revocation changes
|
|
- route epoch changes
|
|
- repeated failures exceed threshold
|
|
|
|
## 11. Observability
|
|
|
|
C17 runtime must be observable before it is useful.
|
|
|
|
Required logs/metrics:
|
|
|
|
- route requested
|
|
- route selected
|
|
- route execution started
|
|
- channel opened
|
|
- envelope sent
|
|
- envelope forwarded
|
|
- envelope received
|
|
- route delivery succeeded
|
|
- route delivery failed
|
|
- route rejected with reason
|
|
- relay rejected with reason
|
|
- TTL/hop loop rejected
|
|
- fallback route used
|
|
- kill-switch activated
|
|
|
|
Metrics:
|
|
|
|
- active routes
|
|
- active channels
|
|
- route success rate
|
|
- route failure rate
|
|
- relay forwarding count
|
|
- relay rejection count
|
|
- route latency
|
|
- hop latency
|
|
- queue depth by channel class
|
|
- dropped synthetic messages
|
|
|
|
Tenant-visible views must not expose topology.
|
|
|
|
## 12. Rollback and Kill Switch
|
|
|
|
Mesh runtime must have an immediate rollback path.
|
|
|
|
Required controls:
|
|
|
|
- global feature flag: mesh runtime disabled
|
|
- cluster feature flag: mesh runtime disabled for cluster
|
|
- node feature flag: mesh runtime disabled for node
|
|
- route class flag: disable relay/multi-hop
|
|
- channel class flag: disable non-control classes
|
|
|
|
Rollback behavior:
|
|
|
|
- stop creating new mesh routes
|
|
- close synthetic runtime channels after drain or immediately by severity
|
|
- keep node enrollment/heartbeat unaffected
|
|
- keep RDP direct worker WSS and backend gateway fallback unaffected
|
|
- keep backend control plane unaffected
|
|
|
|
## 13. Smoke / Test Topology
|
|
|
|
Minimum smoke topology:
|
|
|
|
```text
|
|
control-api
|
|
|
|
|
| config/snapshot distribution
|
|
|
|
|
node-a: ingress-capable test node
|
|
|
|
|
| direct synthetic route
|
|
v
|
|
node-b: service/egress-capable test node
|
|
|
|
node-a
|
|
|
|
|
| relay synthetic route
|
|
v
|
|
node-r: relay-capable test node
|
|
|
|
|
v
|
|
node-b
|
|
```
|
|
|
|
Required test roles:
|
|
|
|
- `node-a`: can_accept_client_ingress, can_accept_node_ingress
|
|
- `node-r`: can_accept_node_ingress, can_route_mesh
|
|
- `node-b`: can_accept_node_ingress, can_egress_private_network or service test role
|
|
|
|
Smoke must prove:
|
|
|
|
- direct synthetic route succeeds
|
|
- single-relay synthetic route succeeds
|
|
- wrong cluster rejected
|
|
- wrong node rejected
|
|
- unauthorized channel rejected
|
|
- expired route rejected
|
|
- TTL loop rejected
|
|
- relay disabled kill-switch works
|
|
- mesh runtime disabled kill-switch works
|
|
- RDP baseline unaffected
|
|
|
|
## 14. C17A Result
|
|
|
|
C17A implemented the smallest safe runtime skeleton:
|
|
|
|
- `rap-node-agent` synthetic runtime is disabled by default
|
|
- direct synthetic `fabric.probe` / `fabric.probe_ack` path is test-proven
|
|
- single-relay synthetic `fabric.probe` / `fabric.probe_ack` path is
|
|
test-proven
|
|
- route id, route expiry, TTL, hop count, path validation, and loop protection
|
|
are enforced
|
|
- wrong cluster, wrong node, unauthorized channel, expired route, TTL
|
|
exhaustion, loop, and missing peer are rejected
|
|
- structured log and metrics boundaries exist
|
|
- existing `/mesh/v1/forward` production forwarding remains disabled
|
|
- no RDP, VPN, file, video, or production service traffic uses this skeleton
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15. C17B Proposed Scope
|
|
|
|
C17B implemented route health and failover probes using synthetic traffic only:
|
|
|
|
- keep mesh feature flag disabled by default
|
|
- preserve direct and single-relay synthetic probe behavior
|
|
- synthetic route health probes
|
|
- local route success/failure observations
|
|
- failed synthetic route marking in node-local runtime state
|
|
- warm peer candidate promotion only in test/smoke topology
|
|
- fallback synthetic route selection when the preferred route is unavailable
|
|
- route cache invalidation when policy, peer directory, or route version
|
|
changes
|
|
- no service traffic
|
|
- no RDP traffic
|
|
- no VPN/IP tunnel traffic
|
|
- no organization topology exposure
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15.1 C17C Proposed Scope
|
|
|
|
C17C implemented relay forwarding semantic hardening using synthetic channel
|
|
classes only:
|
|
|
|
- keep mesh feature flag disabled by default
|
|
- C17A direct/single-relay synthetic probes remain intact
|
|
- C17B route health/failover probes remain intact
|
|
- stricter relay envelope validation boundaries
|
|
- per-channel-class bounded queues for synthetic traffic
|
|
- QoS dequeue order for synthetic channel classes
|
|
- backpressure and drop rules for synthetic telemetry only
|
|
- reliable behavior for synthetic fabric/control health probes
|
|
- no service traffic
|
|
- no RDP traffic
|
|
- no VPN/IP tunnel traffic
|
|
- no organization topology exposure
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15.2 C17D Result
|
|
|
|
C17D implemented a non-production service-path experiment:
|
|
|
|
- keep mesh feature flag disabled by default
|
|
- C17A, C17B, and C17C behavior remains intact
|
|
- one test service type only: `synthetic.echo`
|
|
- one test organization only: `org-test`
|
|
- one test cluster only in tests: `cluster-1`
|
|
- bounded test payloads only
|
|
- direct test-service route proven
|
|
- single-relay test-service route proven
|
|
- forced fallback test-service route proven
|
|
- no topology exposed to organizations
|
|
- no production service traffic
|
|
- no RDP traffic
|
|
- no VPN/IP tunnel traffic
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15.3 C17H Result
|
|
|
|
C17H implemented a deployed multi-agent synthetic config smoke on
|
|
`docker-test`:
|
|
|
|
- five running `rap-node-agent` containers consumed backend-issued
|
|
node-scoped synthetic config
|
|
- direct and relay synthetic route-health observations returned through the
|
|
real backend
|
|
- Platform Owner summary reflected the C17H test cluster as healthy
|
|
- all scoped configs kept `production_forwarding=false`
|
|
- no production mesh traffic
|
|
- no service workload traffic
|
|
- no RDP/VPN/IP tunnel traffic
|
|
|
|
VPN/IP tunnel work remains a separate C18 track and must not be mixed into
|
|
C17 mesh runtime work.
|
|
|
|
## 15.4 C17E Historical Result
|
|
|
|
C17E implemented a historical live node-to-node synthetic HTTP transport
|
|
experiment while preserving the production forwarding kill-switch. This result
|
|
is retained only as test-history context; it is not the active transport
|
|
direction for the fabric runtime:
|
|
|
|
- `HTTPPeerTransport` maps explicit peer node IDs to synthetic HTTP endpoint
|
|
URLs.
|
|
- `rap-node-agent` can start a synthetic `/mesh/v1/*` endpoint only when
|
|
`RAP_MESH_SYNTHETIC_RUNTIME_ENABLED=true` and `RAP_MESH_LISTEN_ADDR` is set.
|
|
- peer endpoints and synthetic routes can be injected as JSON for smoke/debug
|
|
only.
|
|
- `mesh-live-smoke` proves direct and single-relay synthetic traffic over real
|
|
local HTTP endpoints.
|
|
- bounded `synthetic.echo` remains the only test-service payload.
|
|
- `/mesh/v1/forward` remains disabled.
|
|
- no production service traffic is authorized.
|
|
|
|
Current direction:
|
|
|
|
- active fabric runtime transport is QUIC-only
|
|
- synthetic HTTP motion is historical test-only context
|
|
- production forwarding/runtime acceptance must use QUIC route execution rather
|
|
than HTTP peer transport
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
go run ./cmd/mesh-live-smoke
|
|
go build -o bin/rap-node-agent.exe ./cmd/rap-node-agent
|
|
go build -o bin/mesh-live-smoke.exe ./cmd/mesh-live-smoke
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15.5 C17F Result
|
|
|
|
C17F implemented scoped synthetic peer/route configuration loading and route
|
|
health reporting:
|
|
|
|
- `ScopedSyntheticConfig` validates `cluster_id`, `local_node_id`, peer
|
|
endpoint shape, route cluster, route membership, and route expiry.
|
|
- `rap-node-agent` prefers `RAP_MESH_SYNTHETIC_CONFIG` over debug JSON route
|
|
and peer endpoint injection.
|
|
- debug JSON remains available only as fallback for smoke/debug.
|
|
- when Fabric testing flags allow synthetic links, node-agent sends synthetic
|
|
route-health probes and reports safe link observations to the Control Plane.
|
|
- route-health metadata explicitly marks `traffic_forwarding=false` and
|
|
`service_workload_traffic=false`.
|
|
- C17E live direct/relay smoke remains intact.
|
|
- `/mesh/v1/forward` remains disabled.
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
go run ./cmd/mesh-live-smoke
|
|
go build -o bin/rap-node-agent.exe ./cmd/rap-node-agent
|
|
go build -o bin/mesh-live-smoke.exe ./cmd/mesh-live-smoke
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15.6 C17G Result
|
|
|
|
C17G implemented a Control Plane read boundary for node-scoped synthetic mesh
|
|
config:
|
|
|
|
- backend endpoint:
|
|
`/clusters/{clusterID}/nodes/{nodeID}/mesh/synthetic-config`
|
|
- endpoint returns no routes/endpoints when effective testing flags do not
|
|
allow synthetic links
|
|
- route intents remain the source for synthetic test route config
|
|
- only route intents whose path contains the requesting node are included
|
|
- unrelated peer endpoints are not returned to the requesting node
|
|
- `production_forwarding=false` is explicit in the response
|
|
- node-agent consumes Control Plane config when local
|
|
`RAP_MESH_SYNTHETIC_CONFIG` is not set
|
|
- local scoped config file remains preferred debug fallback
|
|
- debug JSON remains last fallback only
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
backend
|
|
agents\rap-node-agent
|
|
```
|
|
|
|
## 15.7 C17I-C17Z18 Result
|
|
|
|
C17I through C17Z18 added the first production-forwarding boundary checks, the
|
|
endpoint candidate config/scoring foundation, narrow production
|
|
`fabric.control` forwarding, local forwarding observability, and route-config
|
|
validation plus scoped peer directory/recovery seeds and warm-peer connection
|
|
state/recovery/intent planning, a control-plane health connection manager, and
|
|
a node-scoped rendezvous/relay control-plane lease contract with lease refresh
|
|
telemetry, stale-relay replacement policy, route/path decision metadata, and
|
|
node-side route generation apply/withdraw reporting plus synthetic
|
|
route-health effective-path probing while still keeping production service
|
|
traffic unavailable:
|
|
|
|
- C17I added an explicit `RAP_MESH_PRODUCTION_FORWARDING_ENABLED` node-agent
|
|
gate.
|
|
- C17J added route-bound production envelope validation on `/mesh/v1/forward`
|
|
for `fabric_control` / `fabric.control` only.
|
|
- C17K added local metadata-only accepted-envelope observation after
|
|
validation.
|
|
- C17L added a bounded local in-memory sink for accepted metadata-only
|
|
observations.
|
|
- C17M added disabled-by-default node-agent wiring for the bounded local sink
|
|
through `RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY`.
|
|
- C17N added local metrics for the bounded local sink.
|
|
- C17O added local node-agent logging for aggregate sink metrics.
|
|
- C17P added change-driven suppression for unchanged aggregate sink metrics
|
|
logs.
|
|
- C17Q added local log separation for production forwarding gate state versus
|
|
production forwarding runtime state.
|
|
- C17R added a maximum capacity guard for the local observation sink.
|
|
- C17S added panic-safe fail-closed observer handling.
|
|
- C17T added an explicit production `fabric.control` envelope payload size
|
|
boundary.
|
|
- C17U added an explicit production `fabric.control` envelope `created_at`
|
|
future-skew boundary.
|
|
- C17V added route-scoped peer endpoint candidates with transport, address,
|
|
reachability, NAT type, connectivity mode, priority, policy tags,
|
|
verification time, and metadata.
|
|
- C17W added deterministic local scoring for already-scoped peer endpoint
|
|
candidates.
|
|
- C17X added optional local health observation inputs to endpoint candidate
|
|
scoring.
|
|
- C17Y added Platform Owner Control Panel visibility for node-scoped synthetic
|
|
mesh config.
|
|
- C17Z added production `fabric.control` local delivery and direct next-hop
|
|
forwarding behind the explicit production gate.
|
|
- C17Z1 added route-path-bound production `fabric.control` multi-hop
|
|
forwarding behind the explicit production gate.
|
|
- C17Z2 added local metadata-only production `fabric.control` forwarding
|
|
event logs for accepted, forwarded, delivered, and rejected envelopes.
|
|
- C17Z3 bound production `fabric.control` forwarding to local route config
|
|
when configured routes are available.
|
|
- C17Z4 added node-scoped peer directory and explicit bounded recovery seeds
|
|
to mesh config.
|
|
- C17Z5 added node-agent peer cache runtime state and warm-peer health probes.
|
|
- C17Z6 added explicit advertised endpoint reporting and Control Plane
|
|
projection of latest reported endpoints into scoped mesh config.
|
|
- C17Z7 added multiple advertised endpoint candidates, including
|
|
private/corporate LAN endpoints, and peer-cache selection of the best
|
|
candidate address for warm health.
|
|
- C17Z8 added node-local warm-peer connection states with bounded backoff
|
|
after repeated health-probe failures.
|
|
- C17Z9 added bounded node-local peer recovery planning over peer cache and
|
|
connection states.
|
|
- C17Z10 added node-local peer connection intents and transport readiness
|
|
classification.
|
|
- C17Z11 added a node-local peer connection manager for real control-plane
|
|
health probes over reusable HTTP keep-alive transport.
|
|
- C17Z12 added node-scoped rendezvous/relay control-plane leases and
|
|
relay-control health probes for peers that would otherwise remain
|
|
`waiting_rendezvous`.
|
|
- C17Z13 added heartbeat telemetry for relay admission, peer admission,
|
|
lease renewal posture, and `relay_ready` state.
|
|
- C17Z14 added node-scoped synthetic-config refresh for renewal-needed
|
|
rendezvous leases plus stale relay withdrawal/reselection telemetry.
|
|
- C17Z15 added backend relay replacement/withdrawal policy and alternate
|
|
relay scoring for stale rendezvous relays.
|
|
- C17Z16 added Control Plane `route_path_decisions` with original/effective
|
|
hops, local next hop, selected replacement relay, generation, and boundary
|
|
flags.
|
|
- C17Z17 added node-side route generation tracking for
|
|
`route_path_decisions`, including active/applied/unchanged/withdrawn counts,
|
|
generation change state, and `withdrawn_by_replacement` reporting for stale
|
|
relay paths.
|
|
- C17Z18 applies Control Plane `route_path_decisions` to synthetic
|
|
route-health route config only, probes selected effective paths through
|
|
replacement relays, reports expected/observed hops and drift state, and keeps
|
|
latest route-health observations separate from peer connection-manager
|
|
observations.
|
|
- rejected envelopes are not observed.
|
|
- observation failure fails closed.
|
|
- the bounded sink drops the oldest observation when full and stores no payload
|
|
bodies.
|
|
- metrics expose only capacity, current depth, accepted total, and
|
|
dropped-oldest total.
|
|
- local metrics logging exposes only aggregate sink metrics and adds no read
|
|
API or Control Plane reporting.
|
|
- unchanged aggregate sink metrics are not repeatedly logged.
|
|
- production forwarding runtime is limited to `fabric.control` direct
|
|
next-hop and route-path-bound forwarding when the gate is explicitly enabled.
|
|
- `RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY` is rejected above `10000`.
|
|
- observer errors and observer panics both fail closed as observation failure.
|
|
- validated production `fabric.control` envelope payloads are bounded to
|
|
`4096` bytes.
|
|
- validated production `fabric.control` envelope `created_at` values are
|
|
bounded to a one-minute future skew.
|
|
- backend synthetic config returns only peer endpoints and endpoint candidates
|
|
that belong to the route path containing the requesting node.
|
|
- node-agent scoped synthetic config validates endpoint candidate shape.
|
|
- endpoint candidate scoring returns ranked candidates and reason labels only;
|
|
it does not open connections, choose production routes, or forward payloads.
|
|
- health-aware scoring remains advisory and is not wired into route execution.
|
|
- Platform Owner visibility shows config/candidate/scoring state without
|
|
exposing this to organization panels.
|
|
- service channels remain rejected.
|
|
- arbitrary relay forwarding is not implemented.
|
|
- `/mesh/v1/forward` still returns unavailable for missing production forward
|
|
transport.
|
|
- local production forward event logs expose only metadata and add no read API
|
|
or Control Plane reporting.
|
|
- configured production `fabric.control` envelopes must match local route
|
|
config route_id, cluster, source, destination, path, next hop, allowed
|
|
channel, expiry, max TTL, and max hop count before forwarding.
|
|
- scoped peer directory/recovery seeds feed node-local peer cache and recovery
|
|
planning only; persistent connection management, NAT traversal, and
|
|
relay/rendezvous runtime are not implemented.
|
|
- peer cache runtime selects bounded warm peers and probes `/mesh/v1/health`
|
|
only; it does not maintain persistent data-plane connections or forward
|
|
service payloads.
|
|
- dynamic endpoint reporting requires an explicit advertised endpoint; automatic
|
|
public IP discovery and STUN/TURN/ICE NAT classification are not implemented.
|
|
- private/corporate endpoint handling is candidate/scoring/runtime-health only;
|
|
it does not imply automatic subnet discovery or service payload forwarding.
|
|
- peer connection state is node-local metadata for warm `/mesh/v1/health`
|
|
probes only; it does not create persistent sockets, relay/rendezvous
|
|
runtime, or service payload forwarding.
|
|
- peer recovery planning chooses bounded health-probe candidates only; it does
|
|
not create persistent sockets, automatic NAT traversal, relay/rendezvous
|
|
runtime, or service payload forwarding.
|
|
- peer connection intents classify planned maintain/probe/recover work and
|
|
transport readiness only; they do not open persistent sockets, perform
|
|
STUN/TURN/ICE, run relay/rendezvous, or forward service payloads.
|
|
- peer connection manager probes only control-plane `/mesh/v1/health`; direct,
|
|
private, and corporate peers are probed directly, and C17Z12 can resolve
|
|
matching outbound-only/relay-required peers through `rendezvous_leases` as
|
|
relay-control health probes. It does not forward service payloads.
|
|
- rendezvous lease refresh reloads node-scoped synthetic config and updates
|
|
route/peer/lease state in the running agent, but does not forward service
|
|
payloads.
|
|
- backend relay replacement consumes stale-relay heartbeat feedback, withdraws
|
|
stale explicit rendezvous leases, scores alternate relay candidates, and
|
|
returns replacement lease decisions as control-plane metadata only.
|
|
- route/path decisions publish effective control-plane paths and local next-hop
|
|
metadata only; they do not execute service routes or forward payloads.
|
|
- no RDP, VPN, file, video, or service workload traffic is forwarded.
|
|
|
|
Verification:
|
|
|
|
```powershell
|
|
go test ./...
|
|
```
|
|
|
|
Run from:
|
|
|
|
```powershell
|
|
agents\rap-node-agent
|
|
backend
|
|
```
|
|
|
|
## 16. Risks
|
|
|
|
Primary risks:
|
|
|
|
- accidentally routing service traffic too early
|
|
- creating hidden topology exposure
|
|
- bypassing route policy with direct peer links
|
|
- relay turning into arbitrary packet forwarder
|
|
- route cache becoming authority
|
|
- missing kill-switch or rollback
|
|
- mesh runtime interfering with RDP baseline
|
|
|
|
Mitigation:
|
|
|
|
- synthetic traffic only at C17A
|
|
- strict feature flags
|
|
- route result validation at every hop
|
|
- no service adapter integration until later approved stage
|
|
- topology-safe observability
|
|
- explicit rollback
|
|
|
|
## 17. Result / Decision
|
|
|
|
Stage C17 planning defines a safe, staged implementation path for mesh routing
|
|
runtime. Stage C17A implements the first narrow runtime skeleton for synthetic
|
|
Fabric messages only. Stage C17B adds route health/failover observations using
|
|
synthetic Fabric messages only. Stage C17C adds relay semantic hardening for
|
|
synthetic channel classes only. Stage C17D adds one bounded non-production
|
|
`synthetic.echo` service-path experiment only. Stage C17E proves one
|
|
historical synthetic HTTP carrier experiment using real local endpoints only;
|
|
it is test-only and not representative of the active QUIC fabric runtime.
|
|
Stage C17F proves scoped synthetic config loading and route-health reporting
|
|
only.
|
|
Stage C17G proves Control Plane scoped synthetic config read/consume only.
|
|
Stage C17H proves deployed multi-agent Control Plane synthetic config
|
|
consumption and synthetic route-health reporting on `docker-test` only.
|
|
Stage C17I adds an explicit production-forwarding gate while keeping production
|
|
forwarding unavailable until a later approved runtime stage.
|
|
Stage C17J adds route-bound production envelope validation for fabric-control
|
|
messages only, while still keeping production forwarding unavailable.
|
|
Stage C17K adds metadata-only local observation of accepted production
|
|
envelopes, while still keeping production forwarding unavailable.
|
|
Stage C17L adds bounded local retention of accepted metadata-only observations,
|
|
while still keeping production forwarding unavailable.
|
|
Stage C17M wires that bounded local retention into node-agent only when an
|
|
explicit capacity is configured, while still keeping production forwarding
|
|
unavailable.
|
|
Stage C17N adds local sink metrics while still keeping production forwarding
|
|
unavailable.
|
|
Stage C17O logs aggregate sink metrics locally while still keeping production
|
|
forwarding unavailable.
|
|
Stage C17P suppresses repeated unchanged local aggregate sink metrics logs while
|
|
still keeping production forwarding unavailable.
|
|
Stage C17Q separates production forwarding gate state from runtime state in
|
|
local logs while still keeping production forwarding unavailable.
|
|
Stage C17R adds a maximum local observation sink capacity guard while still
|
|
keeping production forwarding unavailable.
|
|
Stage C17S makes observer panic handling fail closed while still keeping
|
|
production forwarding unavailable.
|
|
Stage C17T adds a validated production fabric-control payload size boundary
|
|
while still keeping production forwarding unavailable.
|
|
Stage C17U adds a validated production fabric-control created-at future-skew
|
|
boundary while still keeping production forwarding unavailable.
|
|
Stage C17V adds a scoped peer endpoint candidate model and NAT/connectivity
|
|
hints while still keeping production forwarding unavailable.
|
|
Stage C17W adds deterministic local endpoint candidate scoring while still
|
|
keeping production forwarding unavailable.
|
|
Stage C17X adds health-aware local endpoint candidate scoring while still
|
|
keeping production forwarding unavailable.
|
|
Stage C17Y adds Platform Owner synthetic mesh config visibility while still
|
|
keeping production forwarding unavailable.
|
|
Stage C17Z adds production fabric-control direct forwarding while still keeping
|
|
production service traffic unavailable.
|
|
Stage C17Z1 adds route-path-bound production fabric-control multi-hop
|
|
forwarding while still keeping production service traffic unavailable.
|
|
Stage C17Z2 adds local metadata-only production fabric-control forwarding
|
|
observability while still keeping production service traffic unavailable.
|
|
Stage C17Z3 adds route-config-bound production fabric-control forwarding
|
|
validation while still keeping production service traffic unavailable.
|
|
Stage C17Z4 adds scoped peer directory and bounded recovery seed config while
|
|
still keeping production service traffic unavailable.
|
|
Stage C17Z5 adds node-agent peer cache runtime and warm-peer health probes
|
|
while still keeping production service traffic unavailable.
|
|
Stage C17Z6 adds dynamic endpoint reporting/config projection while still
|
|
keeping production service traffic unavailable.
|
|
Stage C17Z7 adds private/corporate endpoint candidates and same-site scoring
|
|
while still keeping production service traffic unavailable.
|
|
Stage C17Z8 adds node-local warm-peer connection states and bounded backoff
|
|
while still keeping production service traffic unavailable.
|
|
Stage C17Z9 adds bounded node-local peer recovery planning while still keeping
|
|
production service traffic unavailable.
|
|
Stage C17Z10 adds node-local peer connection intent and transport readiness
|
|
classification while still keeping production service traffic unavailable.
|
|
Stage C17Z11 adds a real node-local peer connection manager for control-plane
|
|
health while still keeping production service traffic unavailable.
|
|
Stage C17Z12 adds node-scoped rendezvous/relay control-plane leases and
|
|
relay-control health probes while still keeping production service traffic
|
|
unavailable.
|
|
Stage C17Z13 adds rendezvous lease admission and renewal-posture telemetry
|
|
while still keeping production service traffic unavailable.
|
|
Stage C17Z14 adds rendezvous lease refresh/reload and stale relay
|
|
withdrawal/reselection telemetry while still keeping production service
|
|
traffic unavailable.
|
|
Stage C17Z15 adds backend relay replacement/withdrawal policy and alternate
|
|
relay-pool scoring for stale rendezvous relays while still keeping production
|
|
service traffic unavailable.
|
|
Stage C17Z16 adds Control Plane route/path decision artifacts for original and
|
|
effective hops while still keeping production service traffic unavailable.
|
|
Stage C17Z17 adds node-side route generation apply/withdraw tracking for
|
|
Control Plane route/path decisions while still keeping production service
|
|
traffic unavailable.
|
|
Stage C17Z18 applies those Control Plane route/path decisions to synthetic
|
|
route-health route config only, so route-health probes can verify replacement
|
|
effective paths while still keeping production service traffic unavailable.
|
|
|
|
Decisions:
|
|
|
|
- C17 is planning only.
|
|
- C17A is implemented and test-proven with synthetic fabric messages only.
|
|
- C17B is implemented and test-proven with synthetic route health/failover
|
|
messages only.
|
|
- C17C is implemented and test-proven with synthetic relay queues/QoS only.
|
|
- C17D is implemented and test-proven with one bounded `synthetic.echo`
|
|
test-service path only.
|
|
- C17E is implemented and smoke-proven with live HTTP synthetic direct and
|
|
single-relay paths only.
|
|
- C17F is implemented and smoke-proven with scoped synthetic route config and
|
|
link observation reporting only.
|
|
- C17G is implemented and test-proven with backend scoped synthetic config and
|
|
node-agent consumption only.
|
|
- C17H is implemented and runtime-proven with five deployed node-agent
|
|
containers, backend-issued node-scoped synthetic config, direct and
|
|
single-relay synthetic route-health observations, and production forwarding
|
|
disabled.
|
|
- C17I is implemented and test-proven with an explicit node-agent
|
|
production-forwarding gate. Enabling the gate still does not forward
|
|
production payloads because no production forwarding runtime is implemented
|
|
in this stage.
|
|
- C17J is implemented and test-proven with route-bound production envelope
|
|
validation on `/mesh/v1/forward`. Only `fabric_control` /
|
|
`fabric.control` is accepted for validation in this stage; service channels
|
|
are rejected and payloads are not forwarded.
|
|
- C17K is implemented and test-proven with metadata-only accepted-envelope
|
|
observation. Rejected envelopes are not observed, observation failure fails
|
|
closed, and payloads are not forwarded.
|
|
- C17L is implemented and test-proven with a bounded local accepted-observation
|
|
sink. Oldest observations are dropped when capacity is exceeded, payload
|
|
metadata is retained, payload bodies are not stored, and payloads are not
|
|
forwarded.
|
|
- C17M is implemented and test-proven with disabled-by-default node-agent
|
|
wiring for the bounded local accepted-observation sink. No observation read
|
|
API or Control Plane reporting is added in this stage.
|
|
- C17N is implemented and test-proven with local metrics for the bounded
|
|
accepted-observation sink. Metrics expose no observation records or payload
|
|
metadata. No observation read API or Control Plane reporting is added in this
|
|
stage.
|
|
- C17O is implemented and test-proven with local node-agent logging for
|
|
aggregate bounded-sink metrics. No observation read API or Control Plane
|
|
reporting is added in this stage.
|
|
- C17P is implemented and test-proven with change-driven suppression for
|
|
unchanged aggregate bounded-sink metrics logs. No observation read API or
|
|
Control Plane reporting is added in this stage.
|
|
- C17Q is implemented and test-proven with local log separation between
|
|
production forwarding gate state and production forwarding runtime state.
|
|
Runtime state remains false.
|
|
- C17R is implemented and test-proven with a maximum local observation sink
|
|
capacity guard.
|
|
- C17S is implemented and test-proven with panic-safe fail-closed observation
|
|
handling.
|
|
- C17T is implemented and test-proven with an explicit validated
|
|
fabric-control payload size boundary.
|
|
- C17U is implemented and test-proven with an explicit validated
|
|
fabric-control created-at future-skew boundary.
|
|
- C17V is implemented and test-proven with route-scoped peer endpoint
|
|
candidates and NAT/connectivity hints in synthetic config.
|
|
- C17W is implemented and test-proven with deterministic local endpoint
|
|
candidate scoring.
|
|
- C17X is implemented and test-proven with health-aware endpoint candidate
|
|
scoring.
|
|
- C17Y is implemented and build/test-proven with Platform Owner synthetic mesh
|
|
config visibility.
|
|
- C17Z is implemented and test-proven with gate-controlled production
|
|
`fabric.control` direct forwarding.
|
|
- C17Z1 is implemented and test-proven with gate-controlled route-path-bound
|
|
production `fabric.control` multi-hop forwarding.
|
|
- C17Z2 is implemented and test-proven with local metadata-only production
|
|
`fabric.control` forwarding event logs for accepted, forwarded, delivered,
|
|
and rejected envelopes.
|
|
- C17Z3 is implemented and test-proven with route-config-bound production
|
|
`fabric.control` forwarding validation.
|
|
- C17Z4 is implemented and test/build-proven with node-scoped peer directory
|
|
and recovery seed config.
|
|
- C17Z5 is implemented and test-proven with node-agent peer cache runtime and
|
|
warm-peer health probes.
|
|
- C17Z6 is implemented and test-proven with explicit advertised endpoint
|
|
reporting and scoped config projection.
|
|
- C17Z7 is implemented and test-proven with multiple public/private/corporate
|
|
endpoint candidates and same-site scoring.
|
|
- C17Z8 is implemented and test-proven with node-local warm-peer connection
|
|
states and bounded backoff.
|
|
- C17Z9 is implemented and test-proven with bounded node-local peer recovery
|
|
planning.
|
|
- C17Z10 is implemented and test-proven with node-local peer connection
|
|
intents and transport readiness classification.
|
|
- C17Z11 is implemented and test-proven with a node-local peer connection
|
|
manager for control-plane health.
|
|
- C17Z12 is implemented and docker-test-runtime-proven with node-scoped
|
|
`rendezvous_leases`; matching `waiting_rendezvous` intents become
|
|
`relay_control` health probes and record/maintain `relay_ready`.
|
|
- C17Z13 is implemented and docker-test-runtime-proven with
|
|
`mesh_rendezvous_lease_report` heartbeat telemetry for relay admission,
|
|
peer admission, TTL/renewal posture, and `relay_ready`.
|
|
- C17Z14 is implemented and docker-test-runtime-proven with node-scoped
|
|
synthetic-config refresh for renewal-needed rendezvous leases, runtime
|
|
peer cache/route/lease reload, refresh counters, and stale relay
|
|
withdrawal/reselection telemetry.
|
|
- C17Z15 is implemented and docker-test-runtime-proven with backend
|
|
stale-relay feedback handling, stale rendezvous lease withdrawal, alternate
|
|
relay scoring, replacement lease issuance, and node-agent relay replacement
|
|
telemetry.
|
|
- C17Z16 is implemented and docker-test-runtime-proven with
|
|
`route_path_decisions` in synthetic config and
|
|
`mesh_route_path_decision_report` heartbeat telemetry for control-plane
|
|
route generation/effective path metadata.
|
|
- C17Z17 is implemented and docker-test-runtime-proven with
|
|
`mesh_route_generation_report` heartbeat telemetry for active/applied/
|
|
unchanged/withdrawn route generation state over control-plane
|
|
`route_path_decisions`.
|
|
- C17Z18 is implemented and docker-test-runtime-proven with synthetic
|
|
route-health effective-path probing from Control Plane
|
|
`route_path_decisions`, route-health config telemetry, and latest-link
|
|
preservation by observation type/route.
|
|
- No RDP, VPN, or production service traffic may use mesh after C17Z18.
|
|
- Route execution must consume Fabric Routing Engine route results.
|
|
- Relay forwarding must be route-bound, TTL-bound, hop-bound, and policy-bound.
|
|
- Observability and kill-switches are required before runtime begins.
|
|
- C17A proves direct and single-relay synthetic routes in a test topology.
|
|
- No further mesh runtime step is authorized without a new explicit staged
|
|
prompt.
|
|
|
|
No RDP, data-plane, VPN, relay production traffic, or service workload
|
|
behavior is changed by C17A/C17B/C17C/C17D/C17E/C17F/C17G/C17H/C17I/C17J/C17K/C17L/C17M/C17N/C17O/C17P/C17Q/C17R/C17S/C17T/C17U/C17V/C17W/C17X/C17Y/C17Z/C17Z1/C17Z2/C17Z3/C17Z4/C17Z5/C17Z6/C17Z7/C17Z8/C17Z9/C17Z10/C17Z11/C17Z12/C17Z13/C17Z14/C17Z15/C17Z16/C17Z17/C17Z18.
|
|
The only runtime code added is disabled-by-default synthetic mesh probe,
|
|
synthetic route health/failover, synthetic relay scheduling, bounded
|
|
`synthetic.echo` test-service execution, live synthetic HTTP peer transport,
|
|
explicit production-forwarding gate checks, route-bound production envelope
|
|
validation, metadata-only accepted-envelope observation, and bounded local
|
|
accepted-observation retention/wiring/metrics/local change-driven logging and
|
|
capacity guarding/fail-closed observation hardening/payload/time-boundary
|
|
validation plus scoped endpoint candidate config validation/scoring and
|
|
health-aware scoring overlay in
|
|
`rap-node-agent`, plus Platform Owner visibility in `web-admin`; C17Z adds only
|
|
route-bound production `fabric.control` local delivery/direct next-hop
|
|
forwarding behind an explicit gate; C17Z1 adds only route-path-bound
|
|
production `fabric.control` multi-hop forwarding; C17Z2 adds only local
|
|
metadata-only production `fabric.control` forwarding event logs; C17Z3 adds
|
|
only local route-config validation for production `fabric.control` forwarding;
|
|
C17Z4 adds only scoped peer directory and recovery seed config boundaries;
|
|
C17Z5 adds only node-agent peer cache runtime and warm-peer health probes;
|
|
C17Z6 adds only explicit advertised endpoint reporting and scoped config
|
|
projection; C17Z7 adds only multiple private/corporate endpoint candidates and
|
|
same-site scoring; C17Z8 adds only node-local warm-peer connection state
|
|
tracking and bounded health-probe backoff; C17Z9 adds only bounded peer
|
|
recovery planning and metadata reporting; C17Z10 adds only peer connection
|
|
intent and transport readiness metadata; C17Z11 adds only control-plane
|
|
health connection manager probing and metadata; C17Z12 adds only
|
|
rendezvous/relay control-plane lease metadata and relay health probes; C17Z13
|
|
adds only rendezvous lease telemetry for admission, renewal posture, and
|
|
relay-ready state; C17Z14 adds only node-scoped lease refresh/reload,
|
|
refresh counters, and stale relay withdrawal/reselection telemetry; C17Z15
|
|
adds only backend relay replacement policy, alternate relay scoring, and
|
|
replacement lease control-plane metadata; C17Z16 adds only route/path decision
|
|
control-plane metadata and node heartbeat reporting for those decisions; C17Z17
|
|
adds only node-side route generation apply/withdraw metadata reporting for
|
|
those control-plane decisions; C17Z18 adds only synthetic route-health
|
|
effective-path probing, route-health config telemetry, drift metadata, and
|
|
latest-link visibility separation for observation types/routes.
|