# Fabric Core Configuration Distribution

Status: Stage C10 result. Documentation and architecture only.

This document consolidates the Fabric Core configuration distribution model for
the Secure Access Fabric platform. It does not implement mesh runtime traffic,
VPN/IP tunnel runtime, relay packet routing, RDP work, service workload
execution, API changes, migrations, or code changes.

## 1. Purpose

Stage C10 defines the boundaries that must exist before the project safely
moves into signed snapshots, node-local storage, config/storage services, peer
directories, routing skeletons, secure node channels, mesh routing, or VPN/IP
tunnel runtime.

The goal is to prevent the lower fabric from growing into an accidental
distributed database, accidental full-mesh topology store, or service-specific
RDP/VPN routing layer.

## 2. Layer Model

The platform layer order remains:

1. Host OS
2. RAP Fabric Core
3. Secure Fabric Network
4. Service Runtime / Service Adapters
5. Access Clients / Admin UI

Fabric Core is the lower distributed runtime foundation above the host OS. It
is not a real operating system. It is implemented through native
`rap-node-agent`, control-plane contracts, scoped signed snapshots, node-local
state, role assignment consumption, update trust, and service supervision
boundaries.

RDP, VNC, SSH, VPN, video, file transfer, and internal-app access are services
above Fabric Core. They consume Fabric Core identity, placement, routing, and
policy; they do not define peer discovery, route selection, cluster authority,
or durable configuration ownership.

## 3. Source of Truth and Cache Boundaries

PostgreSQL remains the only durable source of truth for domain state:

- platform configuration
- clusters
- organizations
- users and memberships
- node identities and enrollment state
- node role assignments
- policies
- resources
- service desired state
- audit
- trust roots and revocation state

Redis remains live coordination only:

- leases
- heartbeats
- ephemeral routing hints
- short-lived tokens
- transient queues
- runtime cache

Redis must not store durable topology, durable configuration, node identity,
policy, organization data, cluster trust, or authoritative route state.

Fabric Storage / Config Storage is a distribution and cache layer. It must not:

- replace PostgreSQL
- become a general-purpose distributed database
- accept direct node writes as authoritative state
- store every cluster or organization object on every node
- expose arbitrary query capabilities
- bypass organization, cluster, role, or service isolation

Node-local state is runtime state plus signed scoped snapshots. It supports
fast operation and degraded reconnect. It is not a source of truth.

## 4. Configuration Layers

Configuration is separated into layers so nodes receive only what their role
requires.

Global platform configuration:

- platform trust roots
- supported protocol versions
- update trust policy
- platform-wide feature gates
- high-risk admin policy

Cluster configuration:

- cluster identity
- cluster trust roots and certificate policy
- cluster authority/partition state
- node role assignments
- QoS policy
- peer discovery policy
- route policy
- storage/config replication policy

Organization configuration:

- organization identity and status
- organization service enablement
- tenant-visible ingress/egress/service endpoints
- tenant policy references
- organization-specific resource references
- safe status projections

Service configuration:

- assigned service workload configuration
- service-specific policy subset
- resource references needed by the assigned workload
- connector or `vpn_connection` references where authorized
- runtime secret references, resolved only through approved secret resolvers

## 5. Scoped Distribution Principle

Nodes receive configuration on a need-to-know basis.

Core mesh node receives:

- scoped peer/neighbor data
- route policy
- QoS policy
- cluster version and trust metadata
- no RDP credentials
- no full organization user list
- no unrelated service configuration

Ingress node receives:

- allowed client entry policies
- token validation configuration
- entry route hints
- service endpoint mapping allowed for the ingress scope
- no full internal topology
- no unrelated organization data

Egress/service node receives:

- assigned service configs
- needed resource references
- needed connector or `vpn_connection` references
- policy for assigned services
- secrets only through approved resolver and only at runtime

Storage/config node receives:

- assigned shard/scope metadata
- replication metadata
- signed snapshot content for its assigned scope
- no unrelated organization data
- no unrestricted topology query access

Thin/mobile node receives:

- minimal bootstrap peers
- active session/tunnel policy subset
- local trust data required to reconnect
- no broad cluster topology

## 6. Signed Scoped Cluster Snapshot Boundary

C10 defines snapshot boundaries only. C11 will define the full signed scoped
cluster snapshot model.

A scoped snapshot is a signed, versioned, role-limited configuration package
that a node-agent can store locally.

Snapshot properties:

- cluster-scoped
- role-scoped
- organization-scoped where applicable
- versioned
- signed by an authorized control-plane signing key
- bounded in size
- expires or requires refresh according to policy
- reconstructable from PostgreSQL source-of-truth state

Snapshot contents may include:

- cluster id and version
- node membership scope
- assigned roles
- allowed service workload refs
- peer directory subset
- route policy subset
- QoS policy subset
- trust roots and revocation metadata
- storage/config endpoints for refresh
- degraded-mode permissions

Snapshot contents must not include:

- unrelated organization data
- broad user lists
- raw secrets
- RDP/VNC/SSH credentials
- full cluster topology unless node role requires it
- arbitrary query permissions

## 7. Node-Local State Boundary

`rap-node-agent` local state may contain:

- node identity material and certificate metadata
- cluster membership state
- signed scoped cluster snapshot
- peer cache
- route cache
- service assignment cache
- service health/status cache
- local health state
- partition/degraded state
- last applied config version
- pending update metadata
- bounded telemetry buffer

Node-local state must not contain:

- full cluster topology unless explicitly required by role
- full organization data
- unrelated organization secrets
- durable policy authority
- durable route authority
- durable audit authority
- unrelated storage shards

Node-agent must be able to operate from local state for short degraded periods
when policy allows it, but it must not authorize high-risk mutations while
isolated.

## 8. Peer Directory and Cache Boundary

Peer directory data is distributed as scoped configuration, not queried from
PostgreSQL on every routing decision.

Peer directory entry fields:

- `node_id`
- `cluster_id`
- endpoint candidates
- roles/capabilities
- region/location hints
- trust/certificate fingerprint
- policy scope
- config version

Node-local peer cache may add runtime observations:

- `last_success_at`
- `last_latency_ms`
- packet loss
- reliability score
- recent failure history
- observed load hints where allowed
- last seen config version

Peer selection is score-based, not latency-only. Inputs include:

- latency
- packet loss
- reliability
- region distance
- node load
- bandwidth availability
- role suitability
- policy constraints
- trust level
- recent failure history

The Fabric Routing Engine owns route selection. Service Adapters must not
discover peers, select mesh routes, create shortcuts, or implement partition
recovery.

## 9. Fabric Storage / Config Storage Role

Fabric Storage / Config Storage is a logical future service. It is a scoped
distribution layer for configuration and signed snapshots.

Responsibilities:

- distribute signed scoped snapshots
- distribute peer directories
- cache hot configuration near service nodes
- replicate critical scoped data across failure domains
- provide nearby read access for node-agent refresh
- support cluster/org/service scope boundaries
- support version-based sync and incremental update delivery

Non-goals:

- no replacement of PostgreSQL
- no arbitrary distributed database behavior
- no direct node writes as authoritative state
- no broad ad hoc query API
- no full topology exposure to tenants
- no full organization data on every node

Placement rules:

- hot data may be placed near services that use it
- cold data may remain remote
- critical data should replicate across failure domains
- replication factor is policy-driven
- storage scope must respect cluster, organization, and service boundaries

## 10. Distribution Flow

Normal flow:

1. Control plane reads authoritative state from PostgreSQL.
2. Control plane compiles scoped configuration views.
3. Control plane signs full scoped snapshots or incremental updates.
4. Fabric Storage / Config Storage distributes and caches scoped artifacts.
5. Node-agent fetches snapshots/updates from authorized endpoints.
6. Node-agent verifies signatures, version, scope, expiry, and trust roots.
7. Node-agent applies configuration into local state.
8. Runtime components consume local state, not live backend calls, for realtime
   route decisions.

Realtime routing decisions must not depend on live backend availability. They
should use verified local state, peer cache, route cache, and policy.

## 11. Versioning and Consistency Rules

Every snapshot and incremental update must carry:

- `cluster_id`
- scope identifiers
- monotonic config version or equivalent epoch
- issued-at timestamp
- expiry or refresh deadline
- signer id / key id
- signature
- dependency/base version for increments

Rules:

- full snapshot can establish or repair local state
- incremental update applies only to the expected base version
- version gaps require full resync
- signature mismatch rejects the update and triggers recovery
- rollback to older config is forbidden unless explicitly authorized by a
  signed recovery policy
- node must report last applied config version in heartbeat/status

## 12. Degraded Mode Rules

Degraded operation is allowed only when policy permits it.

Allowed examples:

- keep already-running safe services alive
- continue existing authorized routes for a short TTL
- reconnect to known active/warm/bootstrap peers
- use last signed snapshot to find config/storage endpoints
- report degraded status when connectivity returns

Forbidden while degraded:

- approve join requests
- issue node certificates
- assign roles
- change cluster policy
- change organization policy
- rotate trust roots
- promote partition authority automatically
- access secrets not already authorized for the node's current role

Degraded mode must be time-bounded and observable.

## 13. Multi-Cluster Isolation

Clusters are isolated by default.

Rules:

- clusters do not automatically trust each other
- clusters do not form one shared mesh by default
- cross-cluster routing requires explicit trust and policy
- platform owner may manage multiple clusters from one console
- organization admins see only authorized clusters/resources
- node may participate in multiple clusters only through isolated memberships
- cluster-scoped identities, certificates, tokens, storage namespaces, and
  policies are required

A multi-cluster node must keep separate local state per cluster:

- separate identity/certificates
- separate snapshots
- separate peer cache
- separate route cache
- separate service assignment cache
- separate storage namespace

## 14. Security Boundaries

Security requirements:

- snapshots are signed
- transport for snapshot/update distribution is authenticated and encrypted
- node-agent verifies signature, scope, expiry, signer, and trust root
- secrets are never embedded directly in broad snapshots
- secrets are resolved through approved resolvers only at runtime
- high-risk admin actions require step-up authentication
- all cluster trust and role changes are audited

High-risk actions include:

- node approval
- role assignment
- cluster trust changes
- cross-cluster trust
- partition promotion
- secrets access
- update policy changes
- signing key rotation

## 15. C11-C18 Staging Boundary

C10 is a design consolidation stage. It prepares later stages:

- C11: signed scoped cluster snapshot model
- C12: node local state store
- C13: config/storage service foundation
- C14: peer directory and cache model
- C15: Fabric Routing Engine skeleton
- C16: secure node-to-node channel lifecycle
- C17: mesh routing runtime
- C18: VPN/IP tunnel service

C10 implements none of these. Later stages must be explicit, narrow, and
verified. Mesh routing and VPN/IP tunnel runtime must not start before C11-C16
foundations are accepted.

## 16. Result / Decision

Stage C10 consolidates the lower Fabric Core configuration distribution model.

Decisions:

- PostgreSQL remains the only durable source of truth.
- Redis remains live coordination only.
- Fabric Storage / Config Storage is a scoped distribution/cache layer, not a
  second source of truth.
- Nodes receive only role/cluster/organization scoped configuration.
- Node-local state is bounded and non-authoritative.
- Signed scoped snapshots are the required foundation for node-local operation
  and degraded recovery.
- Peer directory/cache data is local and scoped; routing remains Fabric-owned.
- Service Adapters remain protocol translators above Fabric Core.
- Multi-cluster membership requires isolated identities, snapshots, caches,
  tokens, policies, and storage namespaces.

No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service
workload behavior is changed by C10.