262 lines
8.1 KiB
Markdown
262 lines
8.1 KiB
Markdown
# Architecture Guardrails
|
|
|
|
Status: architecture guardrails, documentation only.
|
|
|
|
This file exists so architecture documents have a stable guardrails reference
|
|
inside `docs/architecture`. The operational Codex guardrails remain in
|
|
`docs/codex/ARCHITECTURE_GUARDRAILS.md`.
|
|
|
|
## 1. Preserve the Proven RDP Baseline
|
|
|
|
The following are already proven and must remain stable:
|
|
|
|
- live FreeRDP connect
|
|
- active session state
|
|
- terminate
|
|
- detach without killing the remote session
|
|
- reattach without recreating the remote session
|
|
- takeover without recreating the remote session
|
|
- direct worker WSS data plane
|
|
- backend gateway fallback
|
|
- C++ RDP Adapter as the active RDP runtime
|
|
|
|
Architecture clarification must not silently weaken this behavior.
|
|
|
|
## 2. Source of Truth
|
|
|
|
PostgreSQL is the only durable source of truth for domain state.
|
|
|
|
Redis is live coordination only. It may hold leases, heartbeats, routing hints,
|
|
attach tokens, short-lived tokens, and ephemeral cache. It must not become a
|
|
durable source of truth for sessions, organizations, policies, cluster trust,
|
|
peer topology, durable configuration, organization data, route authority, or
|
|
node identity.
|
|
|
|
## 3. Fabric Core Before Mesh Runtime
|
|
|
|
RAP Fabric Core is the lower distributed runtime foundation above the host OS.
|
|
|
|
Fabric Core owns:
|
|
|
|
- native `rap-node-agent` identity
|
|
- enrollment
|
|
- local node state
|
|
- capability reporting
|
|
- role assignment consumption
|
|
- signed scoped configuration snapshots
|
|
- update trust
|
|
- service supervision boundary
|
|
|
|
Mesh runtime traffic must not be implemented before node identity, enrollment,
|
|
role assignment, scoped config distribution, and node-local state are
|
|
trustworthy.
|
|
|
|
## 4. Node Identity and Service Workloads
|
|
|
|
A node is a host-level identity managed by native `rap-node-agent`.
|
|
|
|
Service workloads are separate from node identity. They may be containerized or
|
|
native, but containers are packaging/isolation boundaries only.
|
|
|
|
Capabilities are not permissions. Role assignment must be explicit per cluster
|
|
and, when needed, per organization.
|
|
|
|
## 5. Routing Ownership
|
|
|
|
Routing is owned by the Fabric layer, not individual Service Adapters.
|
|
|
|
RDP, VNC, SSH, VPN, video, and file services may request a destination node,
|
|
resource target, egress node, or egress pool. The Fabric Routing Engine chooses
|
|
the path.
|
|
|
|
Routing decisions must not depend on live backend availability. They use
|
|
node-local state, signed scoped snapshots, peer cache, route cache, and policy.
|
|
|
|
Service Adapters must not implement mesh topology discovery, multi-hop route
|
|
selection, shortcut creation, partition recovery, or cross-cluster routing
|
|
policy.
|
|
|
|
Service Adapters must not select routes, discover peers, manage mesh
|
|
connections, implement mesh failover, implement shortcut logic, implement
|
|
partition recovery, or implement cross-cluster routing policy.
|
|
|
|
## 6. Need-to-Know Configuration
|
|
|
|
Nodes should be small, fast, and scoped.
|
|
|
|
A node receives only the configuration required for its cluster membership,
|
|
assigned role, service workload, and organization scope. It must not store full
|
|
cluster topology, unrelated organization data, unrelated storage shards, peer
|
|
caches outside its scope, or secrets it does not need.
|
|
|
|
Secrets must be delivered only through approved resolvers and only at runtime
|
|
when needed.
|
|
|
|
## 7. Fabric Storage Boundaries
|
|
|
|
Fabric Storage / Config Storage is a future distribution and cache layer, not a
|
|
new source of truth.
|
|
|
|
Storage service must not:
|
|
|
|
- replace PostgreSQL
|
|
- become a general-purpose distributed database
|
|
- accept direct node writes as authoritative state
|
|
- store full cluster or organization data on every node
|
|
- expose arbitrary query capabilities
|
|
- bypass organization and cluster isolation
|
|
|
|
## 8. Multi-Tenancy Isolation
|
|
|
|
Every organization must be isolated by design.
|
|
|
|
Namespace and authorize:
|
|
|
|
- resources
|
|
- users-in-organization
|
|
- groups
|
|
- policies
|
|
- connectors
|
|
- sessions
|
|
- service endpoints
|
|
- audit
|
|
- secret references
|
|
- storage/cache scopes
|
|
- Redis keys where applicable
|
|
|
|
Organizations must not see intermediate mesh topology, other organizations'
|
|
routes, peer caches, nodes, storage shards, secrets, or platform trust
|
|
internals.
|
|
|
|
## 9. Multi-Cluster Boundaries
|
|
|
|
A platform may manage multiple clusters, but clusters do not automatically
|
|
trust each other and do not form one shared mesh by default.
|
|
|
|
Cross-cluster routing requires explicit trust and policy.
|
|
|
|
Cluster-scoped identities, certificates, tokens, storage namespaces, and
|
|
policies are required. A node may participate in multiple clusters only through
|
|
isolated memberships.
|
|
|
|
## 10. Split-Brain Prevention
|
|
|
|
Never allow minority partitions to become a second authoritative cluster
|
|
automatically.
|
|
|
|
Cluster-wide changes, role changes, trust changes, node approvals, policy
|
|
mutation, partition promotion, and cross-cluster trust must be restricted in
|
|
non-quorum or degraded states.
|
|
|
|
## 11. Control Plane vs Data Plane
|
|
|
|
Control plane owns durable state and policy:
|
|
|
|
- organizations
|
|
- users
|
|
- memberships
|
|
- roles
|
|
- resources
|
|
- policies
|
|
- nodes
|
|
- cluster membership
|
|
- service assignments
|
|
- connector/VPN desired state
|
|
- updates
|
|
- config distribution
|
|
- audit
|
|
|
|
Data plane carries authorized traffic:
|
|
|
|
- session streams
|
|
- worker traffic
|
|
- relay traffic
|
|
- connector traffic
|
|
- future VPN/IP tunnel traffic
|
|
|
|
Do not collapse control plane and data plane into one vague layer.
|
|
|
|
## 12. Updates and Trust
|
|
|
|
Updates must support:
|
|
|
|
- Version Storage / Update Repository as the signed artifact source
|
|
- explicit Control Plane rollout policy and approval
|
|
- signed artifacts
|
|
- no unsigned binaries
|
|
- staged rollout
|
|
- canary rollout
|
|
- rollback
|
|
- health checks
|
|
- local update cache where approved
|
|
- OS / architecture specific artifacts under signed release manifests
|
|
- explicit migration bundles when data structures change
|
|
|
|
Version Storage stores immutable release manifests, artifacts, hashes,
|
|
signatures, compatibility metadata, provenance, and approved migration bundles.
|
|
It must not become a second source of truth for rollout policy, approvals,
|
|
organization state, cluster state, or audit.
|
|
|
|
The native node-agent owns local update trust, health supervision, restart, and
|
|
recovery logic. It may update, restart, or rollback assigned local workloads
|
|
only according to signed manifests and Control Plane policy. Node-agent
|
|
self-update requires stricter staged replacement and crash-safe rollback than
|
|
ordinary workload updates.
|
|
|
|
PostgreSQL schema migrations are orchestrated by the Control Plane release
|
|
process. Node-agent must not independently invent or execute durable
|
|
PostgreSQL schema migrations. Service-local, node-local, cache, or protocol
|
|
schema migrations require signed manifest metadata, preflight checks,
|
|
rollback/fencing behavior, and explicit compatibility rules.
|
|
|
|
## 13. Performance and Routing Awareness
|
|
|
|
Placement and routing decisions must consider:
|
|
|
|
- CPU
|
|
- RAM
|
|
- network load
|
|
- active sessions
|
|
- connector load
|
|
- relay load
|
|
- service type
|
|
- health score
|
|
- latency
|
|
- packet loss
|
|
- bandwidth availability
|
|
- policy constraints
|
|
|
|
Interactive input/control traffic must not wait behind render/video, file
|
|
transfer, telemetry, or VPN bulk traffic.
|
|
|
|
## 14. No Runtime Expansion From Documentation
|
|
|
|
Architecture documentation does not authorize runtime implementation.
|
|
|
|
Do not start the following without an explicit staged prompt:
|
|
|
|
- RDP runtime changes
|
|
- Windows client behavior changes
|
|
- data-plane behavior changes
|
|
- backend session lifecycle changes
|
|
- mesh runtime traffic
|
|
- VPN/IP tunnel runtime
|
|
- relay packet routing
|
|
- QUIC/WebRTC
|
|
- service workload execution
|
|
- new protocol adapters
|
|
|
|
## Result / Decision
|
|
|
|
These guardrails formalize the Secure Access Fabric lower foundation:
|
|
PostgreSQL remains authoritative, Redis remains live-only, Fabric Core comes
|
|
before mesh runtime, Fabric routing must not depend on live backend
|
|
availability, service adapters do not own routing, nodes receive only
|
|
need-to-know scoped configuration, Fabric Storage/Config Storage is not a
|
|
general-purpose distributed database, and organizations must not see internal
|
|
mesh topology. No code, API, migration, RDP, data-plane, mesh, VPN, relay, or
|
|
service workload runtime behavior is changed by this document. Version
|
|
Storage/Update Repository is a future signed artifact and release distribution
|
|
foundation; it is not an updater runtime until a later explicit staged prompt
|
|
authorizes it.
|