Initial project snapshot
This commit is contained in:
@@ -0,0 +1,261 @@
|
||||
# Architecture Guardrails
|
||||
|
||||
Status: architecture guardrails, documentation only.
|
||||
|
||||
This file exists so architecture documents have a stable guardrails reference
|
||||
inside `docs/architecture`. The operational Codex guardrails remain in
|
||||
`docs/codex/ARCHITECTURE_GUARDRAILS.md`.
|
||||
|
||||
## 1. Preserve the Proven RDP Baseline
|
||||
|
||||
The following are already proven and must remain stable:
|
||||
|
||||
- live FreeRDP connect
|
||||
- active session state
|
||||
- terminate
|
||||
- detach without killing the remote session
|
||||
- reattach without recreating the remote session
|
||||
- takeover without recreating the remote session
|
||||
- direct worker WSS data plane
|
||||
- backend gateway fallback
|
||||
- C++ RDP Adapter as the active RDP runtime
|
||||
|
||||
Architecture clarification must not silently weaken this behavior.
|
||||
|
||||
## 2. Source of Truth
|
||||
|
||||
PostgreSQL is the only durable source of truth for domain state.
|
||||
|
||||
Redis is live coordination only. It may hold leases, heartbeats, routing hints,
|
||||
attach tokens, short-lived tokens, and ephemeral cache. It must not become a
|
||||
durable source of truth for sessions, organizations, policies, cluster trust,
|
||||
peer topology, durable configuration, organization data, route authority, or
|
||||
node identity.
|
||||
|
||||
## 3. Fabric Core Before Mesh Runtime
|
||||
|
||||
RAP Fabric Core is the lower distributed runtime foundation above the host OS.
|
||||
|
||||
Fabric Core owns:
|
||||
|
||||
- native `rap-node-agent` identity
|
||||
- enrollment
|
||||
- local node state
|
||||
- capability reporting
|
||||
- role assignment consumption
|
||||
- signed scoped configuration snapshots
|
||||
- update trust
|
||||
- service supervision boundary
|
||||
|
||||
Mesh runtime traffic must not be implemented before node identity, enrollment,
|
||||
role assignment, scoped config distribution, and node-local state are
|
||||
trustworthy.
|
||||
|
||||
## 4. Node Identity and Service Workloads
|
||||
|
||||
A node is a host-level identity managed by native `rap-node-agent`.
|
||||
|
||||
Service workloads are separate from node identity. They may be containerized or
|
||||
native, but containers are packaging/isolation boundaries only.
|
||||
|
||||
Capabilities are not permissions. Role assignment must be explicit per cluster
|
||||
and, when needed, per organization.
|
||||
|
||||
## 5. Routing Ownership
|
||||
|
||||
Routing is owned by the Fabric layer, not individual Service Adapters.
|
||||
|
||||
RDP, VNC, SSH, VPN, video, and file services may request a destination node,
|
||||
resource target, egress node, or egress pool. The Fabric Routing Engine chooses
|
||||
the path.
|
||||
|
||||
Routing decisions must not depend on live backend availability. They use
|
||||
node-local state, signed scoped snapshots, peer cache, route cache, and policy.
|
||||
|
||||
Service Adapters must not implement mesh topology discovery, multi-hop route
|
||||
selection, shortcut creation, partition recovery, or cross-cluster routing
|
||||
policy.
|
||||
|
||||
Service Adapters must not select routes, discover peers, manage mesh
|
||||
connections, implement mesh failover, implement shortcut logic, implement
|
||||
partition recovery, or implement cross-cluster routing policy.
|
||||
|
||||
## 6. Need-to-Know Configuration
|
||||
|
||||
Nodes should be small, fast, and scoped.
|
||||
|
||||
A node receives only the configuration required for its cluster membership,
|
||||
assigned role, service workload, and organization scope. It must not store full
|
||||
cluster topology, unrelated organization data, unrelated storage shards, peer
|
||||
caches outside its scope, or secrets it does not need.
|
||||
|
||||
Secrets must be delivered only through approved resolvers and only at runtime
|
||||
when needed.
|
||||
|
||||
## 7. Fabric Storage Boundaries
|
||||
|
||||
Fabric Storage / Config Storage is a future distribution and cache layer, not a
|
||||
new source of truth.
|
||||
|
||||
Storage service must not:
|
||||
|
||||
- replace PostgreSQL
|
||||
- become a general-purpose distributed database
|
||||
- accept direct node writes as authoritative state
|
||||
- store full cluster or organization data on every node
|
||||
- expose arbitrary query capabilities
|
||||
- bypass organization and cluster isolation
|
||||
|
||||
## 8. Multi-Tenancy Isolation
|
||||
|
||||
Every organization must be isolated by design.
|
||||
|
||||
Namespace and authorize:
|
||||
|
||||
- resources
|
||||
- users-in-organization
|
||||
- groups
|
||||
- policies
|
||||
- connectors
|
||||
- sessions
|
||||
- service endpoints
|
||||
- audit
|
||||
- secret references
|
||||
- storage/cache scopes
|
||||
- Redis keys where applicable
|
||||
|
||||
Organizations must not see intermediate mesh topology, other organizations'
|
||||
routes, peer caches, nodes, storage shards, secrets, or platform trust
|
||||
internals.
|
||||
|
||||
## 9. Multi-Cluster Boundaries
|
||||
|
||||
A platform may manage multiple clusters, but clusters do not automatically
|
||||
trust each other and do not form one shared mesh by default.
|
||||
|
||||
Cross-cluster routing requires explicit trust and policy.
|
||||
|
||||
Cluster-scoped identities, certificates, tokens, storage namespaces, and
|
||||
policies are required. A node may participate in multiple clusters only through
|
||||
isolated memberships.
|
||||
|
||||
## 10. Split-Brain Prevention
|
||||
|
||||
Never allow minority partitions to become a second authoritative cluster
|
||||
automatically.
|
||||
|
||||
Cluster-wide changes, role changes, trust changes, node approvals, policy
|
||||
mutation, partition promotion, and cross-cluster trust must be restricted in
|
||||
non-quorum or degraded states.
|
||||
|
||||
## 11. Control Plane vs Data Plane
|
||||
|
||||
Control plane owns durable state and policy:
|
||||
|
||||
- organizations
|
||||
- users
|
||||
- memberships
|
||||
- roles
|
||||
- resources
|
||||
- policies
|
||||
- nodes
|
||||
- cluster membership
|
||||
- service assignments
|
||||
- connector/VPN desired state
|
||||
- updates
|
||||
- config distribution
|
||||
- audit
|
||||
|
||||
Data plane carries authorized traffic:
|
||||
|
||||
- session streams
|
||||
- worker traffic
|
||||
- relay traffic
|
||||
- connector traffic
|
||||
- future VPN/IP tunnel traffic
|
||||
|
||||
Do not collapse control plane and data plane into one vague layer.
|
||||
|
||||
## 12. Updates and Trust
|
||||
|
||||
Updates must support:
|
||||
|
||||
- Version Storage / Update Repository as the signed artifact source
|
||||
- explicit Control Plane rollout policy and approval
|
||||
- signed artifacts
|
||||
- no unsigned binaries
|
||||
- staged rollout
|
||||
- canary rollout
|
||||
- rollback
|
||||
- health checks
|
||||
- local update cache where approved
|
||||
- OS / architecture specific artifacts under signed release manifests
|
||||
- explicit migration bundles when data structures change
|
||||
|
||||
Version Storage stores immutable release manifests, artifacts, hashes,
|
||||
signatures, compatibility metadata, provenance, and approved migration bundles.
|
||||
It must not become a second source of truth for rollout policy, approvals,
|
||||
organization state, cluster state, or audit.
|
||||
|
||||
The native node-agent owns local update trust, health supervision, restart, and
|
||||
recovery logic. It may update, restart, or rollback assigned local workloads
|
||||
only according to signed manifests and Control Plane policy. Node-agent
|
||||
self-update requires stricter staged replacement and crash-safe rollback than
|
||||
ordinary workload updates.
|
||||
|
||||
PostgreSQL schema migrations are orchestrated by the Control Plane release
|
||||
process. Node-agent must not independently invent or execute durable
|
||||
PostgreSQL schema migrations. Service-local, node-local, cache, or protocol
|
||||
schema migrations require signed manifest metadata, preflight checks,
|
||||
rollback/fencing behavior, and explicit compatibility rules.
|
||||
|
||||
## 13. Performance and Routing Awareness
|
||||
|
||||
Placement and routing decisions must consider:
|
||||
|
||||
- CPU
|
||||
- RAM
|
||||
- network load
|
||||
- active sessions
|
||||
- connector load
|
||||
- relay load
|
||||
- service type
|
||||
- health score
|
||||
- latency
|
||||
- packet loss
|
||||
- bandwidth availability
|
||||
- policy constraints
|
||||
|
||||
Interactive input/control traffic must not wait behind render/video, file
|
||||
transfer, telemetry, or VPN bulk traffic.
|
||||
|
||||
## 14. No Runtime Expansion From Documentation
|
||||
|
||||
Architecture documentation does not authorize runtime implementation.
|
||||
|
||||
Do not start the following without an explicit staged prompt:
|
||||
|
||||
- RDP runtime changes
|
||||
- Windows client behavior changes
|
||||
- data-plane behavior changes
|
||||
- backend session lifecycle changes
|
||||
- mesh runtime traffic
|
||||
- VPN/IP tunnel runtime
|
||||
- relay packet routing
|
||||
- QUIC/WebRTC
|
||||
- service workload execution
|
||||
- new protocol adapters
|
||||
|
||||
## Result / Decision
|
||||
|
||||
These guardrails formalize the Secure Access Fabric lower foundation:
|
||||
PostgreSQL remains authoritative, Redis remains live-only, Fabric Core comes
|
||||
before mesh runtime, Fabric routing must not depend on live backend
|
||||
availability, service adapters do not own routing, nodes receive only
|
||||
need-to-know scoped configuration, Fabric Storage/Config Storage is not a
|
||||
general-purpose distributed database, and organizations must not see internal
|
||||
mesh topology. No code, API, migration, RDP, data-plane, mesh, VPN, relay, or
|
||||
service workload runtime behavior is changed by this document. Version
|
||||
Storage/Update Repository is a future signed artifact and release distribution
|
||||
foundation; it is not an updater runtime until a later explicit staged prompt
|
||||
authorizes it.
|
||||
Reference in New Issue
Block a user