124 lines
6.4 KiB
Markdown
124 lines
6.4 KiB
Markdown
C17Z20 is complete.
|
|
|
|
Installation Authority foundation is also complete:
|
|
|
|
- production config requires strict authority mode with Product Root public key
|
|
- first-owner bootstrap requires a signed activation manifest in strict mode
|
|
- `installation_authority` and signed `platform_role_grants` are persisted
|
|
- strict platform-admin checks ignore direct `users.platform_role` edits unless
|
|
a valid signed grant exists
|
|
- web-admin shows installation status and first-owner bootstrap
|
|
- `scripts/installation/product-root-tool.go` can generate Ed25519 Product Root
|
|
keys and sign activation manifests; private keys must stay outside the repo
|
|
|
|
Cluster Authority foundation is now also complete:
|
|
|
|
- every newly created cluster gets an Ed25519 `cluster_authorities` key record
|
|
- cluster authority private keys are encrypted at rest when
|
|
`SECRET_ENCRYPTION_KEY_B64`/file is configured; production already requires
|
|
a secret encryption key
|
|
- legacy/default clusters are backfilled lazily through `EnsureClusterAuthority`
|
|
- backend signs join-token scope material, node approval/bootstrap material,
|
|
and node-scoped synthetic mesh config snapshots
|
|
- node-agent verifies signed Control Plane synthetic config when
|
|
`authority_required=true` or signature fields are present
|
|
- node-agent can pin `RAP_CLUSTER_AUTHORITY_PUBLIC_KEY` and
|
|
`RAP_CLUSTER_AUTHORITY_FINGERPRINT`, and identity state can store the same
|
|
trust anchor after approval
|
|
- web-admin shows cluster key fingerprints on summaries, join-token output,
|
|
approval rows, and synthetic config visibility
|
|
- docker-test lifecycle smoke is complete: fresh dev install, first-owner
|
|
bootstrap, cluster creation, signed join token, real node-agent enrollment,
|
|
owner approval, automatic signed bootstrap polling, authority pin
|
|
persistence, heartbeat, and signed synthetic config verification all passed
|
|
- `rap-node-agent` desired-workload polling/status reporting is gated by
|
|
`RAP_WORKLOAD_SUPERVISION_ENABLED=false` by default while service runtime
|
|
supervision remains a stub
|
|
|
|
Node enrollment bootstrap polling is also complete:
|
|
|
|
- backend exposes `/node-agents/enrollments/{requestID}/bootstrap`
|
|
- pending agents prove `cluster_id`, `node_fingerprint`, and `public_key`
|
|
before receiving status/bootstrap material
|
|
- `rap-node-agent` stores `pending_join_request_id`, polls approval, verifies
|
|
the signed bootstrap contract, then persists `node_id`, `identity_status`,
|
|
and cluster authority pin into `identity.json`
|
|
- polling is controlled by `RAP_ENROLLMENT_POLL_INTERVAL_SECONDS` and
|
|
`RAP_ENROLLMENT_POLL_TIMEOUT_SECONDS`
|
|
|
|
Current state:
|
|
|
|
- C17Z12 added rendezvous/relay control-plane leases for peers that would
|
|
otherwise stay in `waiting_rendezvous`.
|
|
- C17Z13-C17Z14 added lease telemetry and node-scoped synthetic-config refresh
|
|
for renewal/stale relay recovery.
|
|
- C17Z15 added backend stale-relay replacement/withdrawal policy and alternate
|
|
relay-pool scoring.
|
|
- C17Z16 added Control Plane `route_path_decisions`.
|
|
- C17Z17 added node-side route generation apply/withdraw tracking.
|
|
- C17Z18 applies Control Plane `route_path_decisions` to synthetic
|
|
route-health route config only. The synthetic `fabric.route_health` runtime
|
|
now probes the selected effective path, including replacement relay paths,
|
|
and reports expected/observed hops plus drift state.
|
|
- C17Z19 consumes those synthetic route-health observations in backend relay
|
|
scoring. Drift/unreachable/failure feedback marks the exact selected relay
|
|
stale and can trigger replacement; healthy low-latency route-health boosts
|
|
alternate relay score reasons. Migration `000022` adds the `synthetic` mesh
|
|
service class, and web-admin marks relay policy `rh feedback`.
|
|
- C17Z20 closes the node-side feedback loop. After node-agent reports
|
|
synthetic route-health drift/unreachable/failure, it performs a bounded
|
|
node-scoped synthetic-config refresh, applies returned replacement route
|
|
decisions to route-health config immediately, and reports
|
|
`c17z20.mesh_route_health_feedback_refresh_report.v1`.
|
|
- Backend `mesh_latest_links` now keeps latest observations per observation
|
|
type/route, so `synthetic_route_health` is not overwritten by
|
|
`peer_connection_manager`.
|
|
- Web-admin Fabric links now show observation type, selected relay, and
|
|
route-health effective/observed path.
|
|
- All of this remains control-plane/synthetic route-health only. It does not
|
|
forward RDP/VPN/service payloads, does not start VPN runtime, and does not
|
|
implement arbitrary relay packet forwarding.
|
|
- Cluster Authority and node enrollment bootstrap are docker-test
|
|
lifecycle-smoke verified in run `dev-bootstrap-20260428-201430`.
|
|
- Fresh migration replay found and fixed a PostgreSQL view replacement issue in
|
|
`000021_cluster_authority_keys`; the migration now drops/recreates
|
|
`cluster_admin_summaries` in up/down paths.
|
|
|
|
Runtime report:
|
|
|
|
- `artifacts/c17z18-route-health-effective-path-report.md`
|
|
- `artifacts/c17z19-route-health-feedback-report.md`
|
|
- `artifacts/c17z19-route-health-feedback-smoke-result.json`
|
|
- `artifacts/c17z20-route-health-feedback-refresh-report.md`
|
|
- `artifacts/dev-cluster-enrollment-bootstrap-smoke-report.md`
|
|
- Docker-test smoke command:
|
|
`pwsh -NoProfile -ExecutionPolicy Bypass -File scripts\fabric\c17z12-rendezvous-relay-smoke-ssh.ps1 -KeepRunning`
|
|
- Dev lifecycle smoke command:
|
|
`pwsh -NoProfile -ExecutionPolicy Bypass -File scripts\fabric\dev-cluster-enrollment-bootstrap-smoke-ssh.ps1 -KeepRunning`
|
|
- Last proven runtime run: `c17z18-20260428-221601` (legacy smoke script name,
|
|
current C17Z20 node-agent code)
|
|
- Last proven dev lifecycle run: `dev-bootstrap-20260428-201430`
|
|
- Admin: `http://192.168.200.61:5174/`
|
|
- C17Z20 multi-agent API: `http://192.168.200.61:18120/api/v1`
|
|
- C17Z19 backend-only API: `http://192.168.200.61:18122/api/v1`
|
|
- Dev lifecycle API: `http://192.168.200.61:18121/api/v1`
|
|
|
|
Do not automatically continue into:
|
|
|
|
- RDP/VNC/SSH/file/video/service workload traffic over mesh
|
|
- VPN/IP tunnel runtime implementation
|
|
- arbitrary relay packet forwarding
|
|
- production payload forwarding for relay paths
|
|
- QUIC/WebRTC or STUN/TURN/ICE
|
|
- TUN/TAP, host route, DNS, or firewall manipulation
|
|
- backend/session lifecycle changes
|
|
- Windows client changes
|
|
|
|
Next narrow layer, if approved:
|
|
|
|
C17Z21 should tighten route-health feedback refresh dampening: if an immediate
|
|
feedback refresh returns the same config version or no replacement change, keep
|
|
a per-route/relay no-change cooldown before retrying. Keep the boundary
|
|
synthetic/control-plane only and keep RDP/VPN/service payload forwarding
|
|
untouched.
|