34 KiB
CODEX CONTEXT
Project identity
This project is a production-grade distributed secure access platform.
It started as a custom RDP proxy with persistent server-side sessions, but the final target architecture is broader:
- distributed secure access fabric
- multi-tenant platform
- session broker for GUI and future non-GUI protocols
- cluster mesh of nodes
- connector/VPN layer
- customer-managed and platform-managed nodes
- node-agent based self-update / rollback / health supervision
Current proven foundation
The current codebase already proved the most risky low-level lifecycle assumptions for RDP:
- real FreeRDP connect works
- session state transitions to active work
- terminate works
- detach works without killing the remote session
- reattach works without recreating the remote session
- takeover works without recreating the remote session
- per-resource certificate verification policy exists
certificate_verification_mode = strict | ignorestrictis defaultignoreworks on a per-resource basis- worker build is reproducible
- backend build is reproducible
This proven lifecycle must NOT be broken by future architecture work.
Current architecture baseline
Current audit and baseline snapshot:
docs/audits/PROJECT_AUDIT_2026-04-26.mddocs/audits/CURRENT_BASELINE_MATRIX.md
Test environment
- Canonical test Docker host:
192.168.200.61 - Canonical Docker context:
test-ubuntu - Canonical SSH alias:
docker-test - Backend API for local/client smoke runs:
http://192.168.200.61:8080/api/v1 - WebSocket gateway for local/client smoke runs:
ws://192.168.200.61:8080/api/v1/gateway/ws - Stage C17 planning is completed.
- C17A synthetic mesh runtime skeleton is implemented and test-proven in
rap-node-agentonly. It is disabled by default and carries syntheticfabric.probe/fabric.probe_ackmessages only. - C17B route health and failover probes are implemented and test-proven in
rap-node-agentonly. They are disabled by default and carry syntheticfabric.route_health/fabric.route_health_ackmessages only. - C17C relay semantic hardening is implemented and test-proven in
rap-node-agentonly. It is disabled by default and models synthetic per-channel queues/QoS/backpressure only. - C17D non-production test-service path is implemented and test-proven in
rap-node-agentonly. It is disabled by default and carries only boundedsynthetic.echotest payloads. - C17E/C17F/C17G are implemented and proven for live synthetic HTTP transport, scoped synthetic route config, and Control Plane scoped synthetic config consumption.
- C17H deployed multi-agent synthetic config smoke is runtime-proven on
docker-test: five runningrap-node-agentcontainers consume backend-issued node-scoped synthetic config, direct and single-relay synthetic route-health observations return to the Control Plane, and production forwarding remains disabled. - C17I production forwarding gate foundation is implemented and test-proven:
rap-node-agenthas an explicit production-forwarding gate, while/mesh/v1/forwardstill refuses production payload forwarding until a later approved runtime stage. - C17J production envelope contract is implemented and test-proven:
/mesh/v1/forwardvalidates route-bound production envelopes forfabric_control/fabric.controlonly when the gate is enabled, rejects service channels, and still refuses production forwarding. - C17K production envelope observation is implemented and test-proven: valid accepted envelopes can be observed locally as metadata-only records after validation; rejected envelopes are not observed, observation failure fails closed, and production forwarding remains unavailable.
- C17L bounded production observation sink is implemented and test-proven: accepted metadata-only observations can be retained locally with fixed capacity, oldest-entry drop behavior, and no payload body storage.
- C17M production observation sink wiring is implemented and test-proven:
node-agent can wire the bounded local metadata-only sink when
RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITYis explicitly greater than zero; the wiring is disabled by default and exposes no read API. - C17N production observation sink metrics are implemented and test-proven: local sink metrics expose only capacity, current depth, accepted total, and dropped-oldest total; they expose no observation records or payload metadata.
- C17O production observation sink local metrics logging is implemented and test-proven: node-agent logs aggregate sink metrics locally when the sink is explicitly enabled; no read API or Control Plane reporting is added.
- C17P production observation sink change-driven metrics logging is implemented and test-proven: node-agent suppresses repeated identical local sink metrics logs; no read API or Control Plane reporting is added.
- C17Q production forwarding gate/runtime log boundary is implemented and
test-proven: node-agent logs production forwarding gate state separately from
production forwarding runtime state. Runtime state remained false until
C17Z introduced gate-controlled
fabric.controldirect forwarding. - C17R production observation sink capacity guard is implemented and
test-proven:
RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITYis rejected above10000. - C17S production observation panic fail-closed hardening is implemented and test-proven: observer errors and observer panics both fail closed as observation failure.
- C17T production envelope payload boundary is implemented and test-proven:
validated production
fabric.controlenvelope payloads are bounded to4096bytes and oversized envelopes are rejected before observation. - C17U production envelope created-at skew boundary is implemented and
test-proven: validated production
fabric.controlenvelopes whosecreated_atis more than one minute in the future are rejected before observation. - C17V peer endpoint candidate model is implemented and test-proven: node-scoped synthetic mesh config now carries route-scoped endpoint candidates with transport, address, reachability, NAT type, connectivity mode, priority, policy tags, verification time, and metadata. This is a model/config boundary only; no production route scoring, NAT traversal, shortcut routing, or forwarding runtime is implemented.
- C17W peer endpoint candidate scoring model is implemented and test-proven:
rap-node-agentcan rank already-scoped endpoint candidates using soft inputs such as transport, reachability, connectivity mode, NAT type, priority, region, policy tags, channel class, and verification age. This is a scoring helper only; it does not open connections, choose production routes, or forward payloads. - C17X health-aware endpoint candidate scoring overlay is implemented and
test-proven: endpoint candidate scoring can optionally use local health
observations keyed by
endpoint_id, including latency, success/failure history, recent failure reason, reliability score, and observation freshness. This remains advisory scoring only and is not wired into production route execution. - C17Y Platform Owner synthetic mesh visibility is implemented and
build/test-proven:
web-adminreads node-scoped synthetic mesh config and shows config enabled state, route counts, peer endpoints, endpoint candidates, C17X advisory scoring boundary, andproduction_forwarding. This remains platform-owner visibility only and does not enable production forwarding. - C17Z production fabric-control direct forwarding boundary is implemented and
test-proven: when
RAP_MESH_PRODUCTION_FORWARDING_ENABLED=true,/mesh/v1/forwardcan deliver valid route-boundfabric.controlenvelopes at the local destination or forward them to a direct next hop from explicit peer endpoint config. Service channels, arbitrary relay forwarding, multi-hop production route execution, and RDP/VPN/file/video/service payloads remain unavailable. - C17Z1 production fabric-control multi-hop route-path boundary is implemented
and test-proven: production
fabric.controlenvelopes can carryroute_pathandvisited_node_ids; relay nodes validate path position, forward only to the next path node, update TTL/hop/visited metadata, and reject loops. Service payloads remain unavailable. - C17Z2 production fabric-control forwarding observability boundary is
implemented and test-proven: node-agent emits local
mesh_production_forward_eventlogs for accepted, forwarded, delivered, and rejected productionfabric.controlenvelopes. Logs are metadata-only and include no payload bodies or read API. - C17Z3 production fabric-control route-config boundary is implemented and
test-proven: when scoped/control-plane mesh routes are available locally,
production
fabric.controlenvelopes must match configured route_id/path/ next-hop/channel/expiry/TTL/hop limits before forwarding. - C17Z4 scoped peer directory and recovery seeds boundary is implemented and
test/build-proven: node-scoped mesh config carries scoped
peer_directoryand explicit boundedrecovery_seeds; node-agent parses/validates them and web-admin shows counts. - C17Z5 node-agent peer cache runtime boundary is implemented and test-proven:
node-agent builds a local
PeerCache, selects bounded warm peers, probes warm peers with/mesh/v1/health, and reports metadata-only mesh-link observations when synthetic mesh testing is enabled. - C17Z6 dynamic endpoint reporting boundary is implemented and test-proven: node-agent reports explicit advertised mesh endpoint metadata in heartbeat, and Control Plane projects latest reported endpoints/candidates into node-scoped synthetic mesh config.
- C17Z7 private/corporate endpoint candidate boundary is implemented and test-proven: node-agent reports multiple advertised endpoint candidates, scoring rewards private/corporate same-site candidates, and peer cache can use the best candidate address for warm health.
- C17Z8 peer connection state machine boundary is implemented and test-proven:
node-agent tracks warm-peer states
disconnected,connecting,ready,degraded, andbackoff, with bounded backoff after repeated health probe failures. - C17Z9 peer recovery planner boundary is implemented and test-proven: node-agent targets a bounded stable ready-peer set, enters recovery when ready peers fall below target, and selects bounded recovery probes from warm peers, recovery seeds, and other connectable scoped peers.
- C17Z10 peer connection intent planner boundary is implemented and test-proven: node-agent classifies bounded peer work as maintain/probe/ recover and classifies transport readiness as direct/private_lan/ corporate_lan/outbound_only/relay_required, with rendezvous-required metadata only.
- C17Z11 peer connection manager runtime boundary is implemented and
test-proven: node-agent uses a reusable HTTP keep-alive client for real
control-plane health probes of direct/private/corporate peers and records
waiting_rendezvousfor outbound-only/relay-required peers. - C17Z12 rendezvous/relay control-plane contract is implemented and
docker-test-runtime-proven: backend issues node-scoped
rendezvous_leases, node-agent resolves matchingwaiting_rendezvousintents intorelay_control, probes relay/mesh/v1/health, records and maintainsrelay_ready, and keeps service payload forwarding disabled. - C17Z13 rendezvous lease telemetry is implemented and
docker-test-runtime-proven: node-agent reports
mesh_rendezvous_lease_reportwith relay admission, peer admission, TTL/renewal posture,relay_ready, and explicit no-payload boundary flags; web-admin showsrv leasesin recent heartbeat tables. - C17Z14 rendezvous lease refresh contract is implemented and docker-test-runtime-proven: node-agent refreshes renewal-needed/stale rendezvous leases through node-scoped synthetic config reload, updates the running peer cache/route/lease state, and reports refresh plus stale relay withdrawal/reselection telemetry. Service payload forwarding remains unavailable.
- C17Z15 backend relay replacement policy is implemented and
docker-test-runtime-proven: backend consumes recent stale-relay heartbeat
feedback, withdraws stale explicit rendezvous leases, scores alternate relay
candidates from route adjacency, endpoint priority, policy tags, and recent
mesh-link health, and returns replacement leases plus
rendezvous_relay_policydecisions in node-scoped synthetic config. Node-agent reportsc17z15.mesh_rendezvous_lease_report.v1and keeps stale state scoped to the exact lease/relay, so replacement leases for the same peer are not marked stale by association. Service payload forwarding remains unavailable. - C17Z16 route/path decision artifact is implemented and
docker-test-runtime-proven: backend
c17z16.synthetic.v1config includesroute_path_decisionswith original hops, effective hops, local previous/ next hop, selected replacement relay, generation, score reasons, and no-payload boundary flags. Node-agent stores the control-plane route generation and reportsc17z16.mesh_route_path_decision_report.v1plusc17z16.mesh_rendezvous_lease_report.v1. Service payload forwarding remains unavailable. - C17Z17 node-side route generation tracker is implemented and
docker-test-runtime-proven: backend
c17z17.synthetic.v1config and node-agentmesh_route_generation_reporttrack active/applied/unchanged/ withdrawn route decisions, generation changes, total counters, andwithdrawn_by_replacementrecords for stale relay paths when replacement is first observed. Service payload forwarding remains unavailable. - C17Z18 synthetic route-health effective path runtime is implemented and
docker-test-runtime-proven: backend
c17z18.synthetic.v1config and node-agentmesh_route_health_config_reportapply Control Planeroute_path_decisionsto synthetic route-health route config only. The synthetic runtime probes selected effective paths through replacement relays, reports expected/observed hops and drift state, and backend latest mesh links preserve route-health observations separately from connection-manager observations. Service payload forwarding remains unavailable. - C17Z19 synthetic route-health feedback scoring is implemented and
docker-test-runtime-proven: backend consumes recent
synthetic_route_healthobservations in relay scoring, uses drift/unreachable/failure metadata to mark the exact selected relay stale, boosts healthy low-latency relay candidates, and returns replacement leases/route decisions through the existing synthetic config contract. Migration000022adds thesyntheticmesh service class. Service payload forwarding remains unavailable. - C17Z20 node-side route-health feedback refresh is implemented and
docker-test-runtime-proven: after reporting synthetic route-health
drift/unreachable/failure, node-agent performs a bounded node-scoped
synthetic-config refresh, applies returned replacement route decisions to
route-health config immediately, and reports
c17z20.mesh_route_health_feedback_refresh_report.v1. Service payload forwarding remains unavailable. - Installation Authority foundation is implemented: production requires strict
Product Root public key config, first-owner bootstrap uses signed Ed25519
activation manifests,
installation_authorityand signedplatform_role_grantsare persisted, and strict platform-admin checks ignore directusers.platform_roledatabase edits without a valid signed grant. Web-admin exposes installation status/first-owner bootstrap, andscripts/installation/product-root-tool.gogenerates keys/manifests for offline product-root operations. - Cluster Authority and node enrollment bootstrap are docker-test lifecycle
smoke-proven in run
dev-bootstrap-20260428-201430: a fresh dev install bootstrapped the first owner, created a cluster, issued a signed join token, accepted realrap-node-agentenrollment, owner-approved the join request, agent-polled signed bootstrap, persisted cluster authority pin, heartbeated, and verified signedc17z18.synthetic.v1Control Plane config. Production service payload forwarding remains unavailable. - Migration
000021_cluster_authority_keysdrops/recreatescluster_admin_summariesbecause fresh replay proved PostgreSQL cannot change that view layout viaCREATE OR REPLACE VIEW. rap-node-agentdesired-workload polling/status reporting is gated byRAP_WORKLOAD_SUPERVISION_ENABLED=falseby default while service runtime supervision remains a stub.- C18 VPN/IP tunnel service target design is completed as documentation only.
- C18A VPN/IP tunnel control-plane data model foundation is implemented and backend-test-proven.
- C18B VPN/IP tunnel lease/fencing hardening is implemented and backend-test-proven.
- C18C VPN/IP tunnel node-agent desired-state consumption/reporting is implemented and backend-test-proven.
- No next platform-core implementation step is automatically authorized after C17Z20. The next mesh layer should stay limited to route-health feedback refresh dampening/no-change cooldown unless the user explicitly chooses another staged task.
- Latest RDP performance reference image:
rap-rdp-worker:rdp-perf6-dirty-region - Stage 5.2 file-download runtime artifacts remain preserved for when RDP work resumes, but they are not the active next task.
- Do not use
docker.cin.sufor this project unless explicitly requested for a separate one-off check.
Backend
- Go
- PostgreSQL = source of truth
- Redis = live coordination / routing only
- REST for control plane
- WebSocket for live session channel
Worker
- C++ worker
- FreeRDP integration
- worker runtime hides FreeRDP details from backend
- The C++ worker remains the primary RDP runtime.
- Target RDP performance direction:
docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md. - The RDP performance rewrite scope is limited to C++ RDP service adapter internals. It must not redesign backend control plane, cluster transport, organizations, leases, or session lifecycle.
- The C# RDP service skeleton is inactive research scaffolding and is not the current runtime direction.
- Current RDP Adapter baseline: RDP-Perf-6 dirty-region direct binary rendering
is completed and smoke-proven on
docker-test. RDP work is paused by product decision; next active work is Fabric Core / cluster foundation. - P3/P3.1 security-readiness foundation exists: production mode rejects
plaintext credential-like resource metadata, requires
secret_reffor RDP/VNC/SSH resources, and has an encrypted PostgreSQL-backed resource secret storage/resolver MVP. P3.2 direct-worker TLS/PKI guard exists. - P3.3 production-like test-stand smoke is complete on
docker-test: backend runs inAPP_ENV=productionwith a test-only secret key file, a secret-backed RDP resource starts real sessions through the resolver path, metadata/audit do not contain plaintext credentials, and backend gateway fallback remains available when direct worker WSS trust issmoke_insecure. - P3.4 production direct-worker WSS trust model is documented in
docs/architecture/PRODUCTION_DIRECT_WORKER_WSS_TRUST.md; it defines platform CA/public CA behavior, worker certificate SAN/identity requirements, app-local Windows trust direction, rotation/revocation, and the futureplatform_casmoke plan. No RDP runtime behavior changed in P3.4. - P3.5 app-local platform CA trust is implemented and runtime-proven on
docker-test: Windows client validates direct worker WSS with an app-local platform CA bundle, keeps hostname/SAN validation enabled, selectsdirect_worker_wsswithout insecure TLS bypass, and falls back to backend gateway for unknown CA / smoke-only production cases. - P3.6 stale Redis worker/live event idempotency is implemented and runtime-proven: stale worker events for terminal PostgreSQL sessions are ignored, backend restart survives stale Redis events, and terminal sessions are not reopened.
- Stage 5.2 server-to-client file download core data path is runtime-proven:
direct worker WSS and backend gateway fallback both download text/binary
files from
RAP_Transfers\ToClientwith matching size/hash, and direct policy blocking is proven fordisabledandclient_to_server. Lifecycle blocking is also runtime-proven for detach, old-client takeover, and worker failure. Runtime report:artifacts/stage5-2-file-download-runtime-report.md. - Stage 5.2 is not fully accepted yet. Remaining proof: Windows desktop UI download path and regression matrix for rendering/input/clipboard/upload/ reconnect/takeover.
Clients
- future native clients:
- Windows: native desktop client first
- Linux: native desktop client later
- web UI is admin/control plane, not the primary power-user client
Final architecture direction
The long-term target architecture is documented in:
docs/architecture/SECURE_ACCESS_FABRIC_TARGET.mddocs/architecture/CLUSTER_NODE_ADMIN_FOUNDATION.mddocs/architecture/WEB_INGRESS_AND_ADMIN_UI_MODEL.md
This document defines the target Secure Access Fabric architecture only. It is not the current implementation scope and must not be used as permission to start mesh, VPN, multi-cluster, updater, or realtime data-plane migration work without an explicit staged prompt.
CLUSTER_NODE_ADMIN_FOUNDATION.md defines the next platform-core planning
baseline for clusters, node enrollment, native node-agent identity, platform
admin console, multi-cluster administration, and future organization admin
visibility. It is a staged foundation document, not permission to implement
mesh packet routing or VPN runtime.
WEB_INGRESS_AND_ADMIN_UI_MODEL.md defines WEB as HTTP/HTTPS ingress and
Admin UI presentation only. Cluster configuration remains Control Plane
ownership through scoped APIs, PostgreSQL source-of-truth mutations, and audit.
Dynamic pages must be safe schema-driven projections and must not embed
internal topology, peer caches, route caches, secrets, raw credentials, or
arbitrary executable code.
Admin endpoint placement is explicit. Fabric Storage / Config Storage nodes do not automatically host or move the cluster panel. Platform Owner Console remains global platform-owner scope. Cluster Admin Endpoint requires explicit admin/web ingress role assignment, cluster health/trust readiness, and Control Plane authorization. Organization Admin Panel remains a tenant-safe projection.
The final platform must support:
- Multi-tenancy / Organizations
- platform has many organizations
- each organization has isolated users, groups, resources, policies, audit, connectors
- users may belong to multiple organizations
- organization admins only see their organization
- platform admins see platform scope
- Identity federation
- local users
- LDAP / Active Directory
- OIDC
- future extensibility for more identity sources
- access mappings based on external groups / claims
- Cluster of nodes
- no mandatory single central node
- many nodes across many sites
- nodes can be platform-managed or customer-managed
- customer-managed nodes are sandboxed cluster participants, not full cluster owners
- Node agent
- small stable always-running agent on every node
- supervises services
- downloads updates
- verifies signed artifacts
- can rollback to previous version
- can restart crashed services
- can work on thin or thick nodes
- Service-based node model Each node is not monolithic. A node has:
- capabilities: what it can do physically/technically
- enabled services: what it is allowed/assigned to do
Possible services include:
- ingress-gateway
- mesh-router
- relay
- connector-host
- vpn-adapter
- session-worker
- media-relay
- file-relay
- update-cache
- config-replica
- audit-sink
- metrics-exporter
- Cluster mesh and routing
- encrypted inter-node communication
- dynamic topology
- no need for full mesh
- multi-hop routing allowed
- route failover
- client failover between ingress nodes
- connector failover between nodes
- Split-brain prevention
- quorum-based cluster behavior
- minority partition must not become a second authoritative cluster
- degraded / recovery / isolated modes
- manual recovery / promote decision by platform recovery admin
- Connector / VPN layer
- connectors are reusable network access methods
- one connector may be used by multiple resources
- connector placement and failover are controlled by policy
- nodes may be allowed or disallowed to host connectors
- direct access, VPN, relay and future egress modes must fit this model
- Future exit mode
- split tunnel
- full tunnel
- internet access through cluster
- not first implementation priority
Non-negotiable design rules
- Do not rewrite proven session lifecycle carelessly.
- Do not turn Redis into a source of truth.
- Do not make certificate-ignore a global worker setting.
- Do not make customer-managed nodes platform-wide trusted by default.
- Do not create a separate cluster per organization.
- Do not assume a single permanently reachable central node.
- Do not rely on “secret protocol with no docs” as security.
- Security must come from crypto, auth, isolation, policy and observability.
- Prefer incremental evolution from current proven system.
- Do not collapse platform control plane and data plane into one vague layer.
Implementation strategy
The codebase must evolve in phases.
Current implementation focus remains:
- RDP work is paused by product decision
- preserve the accepted RDP Adapter baseline and Stage 5.x file-transfer work
- do not delete or rewrite the current RDP MVP while platform-core work starts
- C1-C9 platform-core foundations are implemented and verified: clusters, node enrollment, node-agent scaffold, platform admin console, workload supervision contract, mesh control-plane prep, mesh skeleton, multi-cluster hardening, and organization admin foundation
- C10 Fabric Core configuration distribution design is completed
- C11 signed scoped cluster snapshot model is completed
- C12 node local state store is completed
- C13 Fabric Storage / Config Storage service foundation is completed
- C14 peer directory and cache model is completed
- C15 Fabric Routing Engine skeleton is completed
- C16 secure node-to-node channel lifecycle is completed
- C17 mesh routing runtime implementation plan is completed
- C17A synthetic mesh runtime skeleton is implemented and test-proven with synthetic fabric messages only, no RDP/VPN/production service traffic
- C17B route health and failover probes are implemented and test-proven with synthetic traffic only, no RDP/VPN/production service traffic
- C17C relay semantic hardening is implemented and test-proven with synthetic channel classes only, no RDP/VPN/production service traffic
- C17D non-production test-service path is implemented and test-proven with
bounded
synthetic.echotraffic only, no RDP/VPN/production service traffic - C17E live node-to-node synthetic HTTP transport is implemented and smoke-proven with synthetic traffic only
- C17F scoped synthetic route config loading and route-health reporting is implemented and smoke-proven with synthetic traffic only
- C17G Control Plane scoped synthetic config read/consume is implemented and test-proven with synthetic traffic only
- C17H deployed multi-agent synthetic config smoke is implemented and
runtime-proven on
docker-testwith synthetic traffic only - C17I production forwarding gate foundation is implemented and test-proven; production forwarding remains unavailable
- C17J production envelope contract validation is implemented and test-proven; production forwarding remains unavailable
- C17K production envelope observation is implemented and test-proven; production forwarding remains unavailable
- C17L bounded production observation sink is implemented and test-proven; production forwarding remains unavailable
- C17M production observation sink wiring is implemented and test-proven; production forwarding remains unavailable
- C17N production observation sink metrics are implemented and test-proven; production forwarding remains unavailable
- C17O production observation sink local metrics logging is implemented and test-proven; production forwarding remains unavailable
- C17P production observation sink change-driven metrics logging is implemented and test-proven; production forwarding remains unavailable
- C17Q production forwarding gate/runtime log boundary is implemented and test-proven; production forwarding remains unavailable
- C17R production observation sink capacity guard is implemented and test-proven; production forwarding remains unavailable
- C17S production observation panic fail-closed hardening is implemented and test-proven; production forwarding remains unavailable
- C17T production envelope payload boundary is implemented and test-proven; production forwarding remains unavailable
- C17U production envelope created-at skew boundary is implemented and test-proven; production forwarding remains unavailable
- C17V peer endpoint candidate model and NAT/connectivity hints are implemented and test-proven; production forwarding remains unavailable
- C17W peer endpoint candidate scoring model is implemented and test-proven; production forwarding remains unavailable
- C17X health-aware endpoint candidate scoring overlay is implemented and test-proven; production forwarding remains unavailable
- C17Y Platform Owner synthetic mesh visibility is implemented and build/test-proven; production forwarding remains unavailable
- C17Z production fabric-control direct forwarding is implemented and test-proven; production service traffic remains unavailable
- C17Z1 production fabric-control multi-hop route-path forwarding is implemented and test-proven; production service traffic remains unavailable
- C17Z2 production fabric-control forwarding observability is implemented and test-proven; production service traffic remains unavailable
- C17Z3 production fabric-control route-config boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z4 scoped peer directory/recovery seed boundary is implemented and test/build-proven; production service traffic remains unavailable
- C17Z5 node-agent peer cache runtime boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z6 dynamic endpoint reporting boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z7 private/corporate endpoint candidate boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z8 peer connection state machine boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z9 peer recovery planner boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z10 peer connection intent planner boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z11 peer connection manager runtime boundary is implemented and test-proven; production service traffic remains unavailable
- C17Z12 rendezvous/relay control-plane contract is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z13 rendezvous lease telemetry is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z14 rendezvous lease refresh contract is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z15 backend relay replacement policy is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z16 route/path decision artifact is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z17 node-side route generation tracker is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z18 synthetic route-health effective path runtime is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z19 synthetic route-health feedback scoring is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- C17Z20 node-side route-health feedback refresh is implemented and docker-test-runtime-proven; production service traffic remains unavailable
- Cluster Authority plus node enrollment bootstrap polling are docker-test
lifecycle-smoke-proven; fresh install migration replay is fixed for
cluster_admin_summaries - C18 VPN/IP tunnel service target design is completed as documentation only
- C18A VPN/IP tunnel control-plane data model foundation is implemented and backend-test-proven
- C18B VPN/IP tunnel lease/fencing hardening is implemented and backend-test-proven
- C18C VPN/IP tunnel node-agent desired-state consumption/reporting is implemented and backend-test-proven
- Version Storage / Update Repository is documented as a future Fabric Core service for signed release manifests, OS/arch artifacts, stable/current/candidate channels, update-cache mirroring, node-agent update supervision, rollback, and explicit data-structure migration bundles. Runtime updater behavior is not implemented.
- no next platform-core implementation step is automatically authorized after C17Z20; choose the next narrow staged prompt explicitly before continuing
- preserve the proven RDP lifecycle behavior
- keep the current backend gateway available as the active/fallback implementation path
The current phase is NOT:
- full mesh routing implementation
- full VPN orchestration
- multi-cluster runtime traffic handling
- production data-plane migration
- updater runtime
- video meetings
- final native client UI redesign
Future mesh, VPN, multi-cluster, node-agent updater, and production realtime data-plane work must be introduced only through explicit, narrow, staged implementation prompts.
Always keep the project production-oriented. Do not simplify it into a toy app.