# Fabric Core Configuration Distribution Status: Stage C10 result. Documentation and architecture only. This document consolidates the Fabric Core configuration distribution model for the Secure Access Fabric platform. It does not implement mesh runtime traffic, VPN/IP tunnel runtime, relay packet routing, RDP work, service workload execution, API changes, migrations, or code changes. ## 1. Purpose Stage C10 defines the boundaries that must exist before the project safely moves into signed snapshots, node-local storage, config/storage services, peer directories, routing skeletons, secure node channels, mesh routing, or VPN/IP tunnel runtime. The goal is to prevent the lower fabric from growing into an accidental distributed database, accidental full-mesh topology store, or service-specific RDP/VPN routing layer. ## 2. Layer Model The platform layer order remains: 1. Host OS 2. RAP Fabric Core 3. Secure Fabric Network 4. Service Runtime / Service Adapters 5. Access Clients / Admin UI Fabric Core is the lower distributed runtime foundation above the host OS. It is not a real operating system. It is implemented through native `rap-node-agent`, control-plane contracts, scoped signed snapshots, node-local state, role assignment consumption, update trust, and service supervision boundaries. RDP, VNC, SSH, VPN, video, file transfer, and internal-app access are services above Fabric Core. They consume Fabric Core identity, placement, routing, and policy; they do not define peer discovery, route selection, cluster authority, or durable configuration ownership. ## 3. Source of Truth and Cache Boundaries PostgreSQL remains the only durable source of truth for domain state: - platform configuration - clusters - organizations - users and memberships - node identities and enrollment state - node role assignments - policies - resources - service desired state - audit - trust roots and revocation state Redis remains live coordination only: - leases - heartbeats - ephemeral routing hints - short-lived tokens - transient queues - runtime cache Redis must not store durable topology, durable configuration, node identity, policy, organization data, cluster trust, or authoritative route state. Fabric Storage / Config Storage is a distribution and cache layer. It must not: - replace PostgreSQL - become a general-purpose distributed database - accept direct node writes as authoritative state - store every cluster or organization object on every node - expose arbitrary query capabilities - bypass organization, cluster, role, or service isolation Node-local state is runtime state plus signed scoped snapshots. It supports fast operation and degraded reconnect. It is not a source of truth. ## 4. Configuration Layers Configuration is separated into layers so nodes receive only what their role requires. Global platform configuration: - platform trust roots - supported protocol versions - update trust policy - platform-wide feature gates - high-risk admin policy Cluster configuration: - cluster identity - cluster trust roots and certificate policy - cluster authority/partition state - node role assignments - QoS policy - peer discovery policy - route policy - storage/config replication policy Organization configuration: - organization identity and status - organization service enablement - tenant-visible ingress/egress/service endpoints - tenant policy references - organization-specific resource references - safe status projections Service configuration: - assigned service workload configuration - service-specific policy subset - resource references needed by the assigned workload - connector or `vpn_connection` references where authorized - runtime secret references, resolved only through approved secret resolvers ## 5. Scoped Distribution Principle Nodes receive configuration on a need-to-know basis. Core mesh node receives: - scoped peer/neighbor data - route policy - QoS policy - cluster version and trust metadata - no RDP credentials - no full organization user list - no unrelated service configuration Ingress node receives: - allowed client entry policies - token validation configuration - entry route hints - service endpoint mapping allowed for the ingress scope - no full internal topology - no unrelated organization data Egress/service node receives: - assigned service configs - needed resource references - needed connector or `vpn_connection` references - policy for assigned services - secrets only through approved resolver and only at runtime Storage/config node receives: - assigned shard/scope metadata - replication metadata - signed snapshot content for its assigned scope - no unrelated organization data - no unrestricted topology query access Thin/mobile node receives: - minimal bootstrap peers - active session/tunnel policy subset - local trust data required to reconnect - no broad cluster topology ## 6. Signed Scoped Cluster Snapshot Boundary C10 defines snapshot boundaries only. C11 will define the full signed scoped cluster snapshot model. A scoped snapshot is a signed, versioned, role-limited configuration package that a node-agent can store locally. Snapshot properties: - cluster-scoped - role-scoped - organization-scoped where applicable - versioned - signed by an authorized control-plane signing key - bounded in size - expires or requires refresh according to policy - reconstructable from PostgreSQL source-of-truth state Snapshot contents may include: - cluster id and version - node membership scope - assigned roles - allowed service workload refs - peer directory subset - route policy subset - QoS policy subset - trust roots and revocation metadata - storage/config endpoints for refresh - degraded-mode permissions Snapshot contents must not include: - unrelated organization data - broad user lists - raw secrets - RDP/VNC/SSH credentials - full cluster topology unless node role requires it - arbitrary query permissions ## 7. Node-Local State Boundary `rap-node-agent` local state may contain: - node identity material and certificate metadata - cluster membership state - signed scoped cluster snapshot - peer cache - route cache - service assignment cache - service health/status cache - local health state - partition/degraded state - last applied config version - pending update metadata - bounded telemetry buffer Node-local state must not contain: - full cluster topology unless explicitly required by role - full organization data - unrelated organization secrets - durable policy authority - durable route authority - durable audit authority - unrelated storage shards Node-agent must be able to operate from local state for short degraded periods when policy allows it, but it must not authorize high-risk mutations while isolated. ## 8. Peer Directory and Cache Boundary Peer directory data is distributed as scoped configuration, not queried from PostgreSQL on every routing decision. Peer directory entry fields: - `node_id` - `cluster_id` - endpoint candidates - roles/capabilities - region/location hints - trust/certificate fingerprint - policy scope - config version Node-local peer cache may add runtime observations: - `last_success_at` - `last_latency_ms` - packet loss - reliability score - recent failure history - observed load hints where allowed - last seen config version Peer selection is score-based, not latency-only. Inputs include: - latency - packet loss - reliability - region distance - node load - bandwidth availability - role suitability - policy constraints - trust level - recent failure history The Fabric Routing Engine owns route selection. Service Adapters must not discover peers, select mesh routes, create shortcuts, or implement partition recovery. ## 9. Fabric Storage / Config Storage Role Fabric Storage / Config Storage is a logical future service. It is a scoped distribution layer for configuration and signed snapshots. Responsibilities: - distribute signed scoped snapshots - distribute peer directories - cache hot configuration near service nodes - replicate critical scoped data across failure domains - provide nearby read access for node-agent refresh - support cluster/org/service scope boundaries - support version-based sync and incremental update delivery Non-goals: - no replacement of PostgreSQL - no arbitrary distributed database behavior - no direct node writes as authoritative state - no broad ad hoc query API - no full topology exposure to tenants - no full organization data on every node Placement rules: - hot data may be placed near services that use it - cold data may remain remote - critical data should replicate across failure domains - replication factor is policy-driven - storage scope must respect cluster, organization, and service boundaries ## 10. Distribution Flow Normal flow: 1. Control plane reads authoritative state from PostgreSQL. 2. Control plane compiles scoped configuration views. 3. Control plane signs full scoped snapshots or incremental updates. 4. Fabric Storage / Config Storage distributes and caches scoped artifacts. 5. Node-agent fetches snapshots/updates from authorized endpoints. 6. Node-agent verifies signatures, version, scope, expiry, and trust roots. 7. Node-agent applies configuration into local state. 8. Runtime components consume local state, not live backend calls, for realtime route decisions. Realtime routing decisions must not depend on live backend availability. They should use verified local state, peer cache, route cache, and policy. ## 11. Versioning and Consistency Rules Every snapshot and incremental update must carry: - `cluster_id` - scope identifiers - monotonic config version or equivalent epoch - issued-at timestamp - expiry or refresh deadline - signer id / key id - signature - dependency/base version for increments Rules: - full snapshot can establish or repair local state - incremental update applies only to the expected base version - version gaps require full resync - signature mismatch rejects the update and triggers recovery - rollback to older config is forbidden unless explicitly authorized by a signed recovery policy - node must report last applied config version in heartbeat/status ## 12. Degraded Mode Rules Degraded operation is allowed only when policy permits it. Allowed examples: - keep already-running safe services alive - continue existing authorized routes for a short TTL - reconnect to known active/warm/bootstrap peers - use last signed snapshot to find config/storage endpoints - report degraded status when connectivity returns Forbidden while degraded: - approve join requests - issue node certificates - assign roles - change cluster policy - change organization policy - rotate trust roots - promote partition authority automatically - access secrets not already authorized for the node's current role Degraded mode must be time-bounded and observable. ## 13. Multi-Cluster Isolation Clusters are isolated by default. Rules: - clusters do not automatically trust each other - clusters do not form one shared mesh by default - cross-cluster routing requires explicit trust and policy - platform owner may manage multiple clusters from one console - organization admins see only authorized clusters/resources - node may participate in multiple clusters only through isolated memberships - cluster-scoped identities, certificates, tokens, storage namespaces, and policies are required A multi-cluster node must keep separate local state per cluster: - separate identity/certificates - separate snapshots - separate peer cache - separate route cache - separate service assignment cache - separate storage namespace ## 14. Security Boundaries Security requirements: - snapshots are signed - transport for snapshot/update distribution is authenticated and encrypted - node-agent verifies signature, scope, expiry, signer, and trust root - secrets are never embedded directly in broad snapshots - secrets are resolved through approved resolvers only at runtime - high-risk admin actions require step-up authentication - all cluster trust and role changes are audited High-risk actions include: - node approval - role assignment - cluster trust changes - cross-cluster trust - partition promotion - secrets access - update policy changes - signing key rotation ## 15. C11-C18 Staging Boundary C10 is a design consolidation stage. It prepares later stages: - C11: signed scoped cluster snapshot model - C12: node local state store - C13: config/storage service foundation - C14: peer directory and cache model - C15: Fabric Routing Engine skeleton - C16: secure node-to-node channel lifecycle - C17: mesh routing runtime - C18: VPN/IP tunnel service C10 implements none of these. Later stages must be explicit, narrow, and verified. Mesh routing and VPN/IP tunnel runtime must not start before C11-C16 foundations are accepted. ## 16. Result / Decision Stage C10 consolidates the lower Fabric Core configuration distribution model. Decisions: - PostgreSQL remains the only durable source of truth. - Redis remains live coordination only. - Fabric Storage / Config Storage is a scoped distribution/cache layer, not a second source of truth. - Nodes receive only role/cluster/organization scoped configuration. - Node-local state is bounded and non-authoritative. - Signed scoped snapshots are the required foundation for node-local operation and degraded recovery. - Peer directory/cache data is local and scoped; routing remains Fabric-owned. - Service Adapters remain protocol translators above Fabric Core. - Multi-cluster membership requires isolated identities, snapshots, caches, tokens, policies, and storage namespaces. No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service workload behavior is changed by C10.