149 lines
3.3 KiB
Markdown
149 lines
3.3 KiB
Markdown
# Architecture guardrails
|
|
|
|
These rules are mandatory.
|
|
|
|
## 1. Preserve the proven session foundation
|
|
The following are already proven and must remain stable:
|
|
- live FreeRDP connect
|
|
- active session state
|
|
- terminate
|
|
- detach without killing remote session
|
|
- reattach without recreating remote session
|
|
- takeover without recreating remote session
|
|
|
|
No architectural refactor may silently weaken this behavior.
|
|
|
|
## 2. Source of truth
|
|
- PostgreSQL is the only durable source of truth for domain state.
|
|
- Redis is only for live coordination, routing, heartbeats, leases, attach tokens, and ephemeral cache.
|
|
|
|
## 3. Control plane vs data plane
|
|
Keep them distinct.
|
|
|
|
### Control plane
|
|
- organizations
|
|
- users
|
|
- memberships
|
|
- roles
|
|
- resources
|
|
- policies
|
|
- nodes
|
|
- services
|
|
- connectors
|
|
- cluster membership
|
|
- updates
|
|
- config distribution
|
|
|
|
### Data plane
|
|
- session streams
|
|
- worker traffic
|
|
- relay traffic
|
|
- connector traffic
|
|
- future exit traffic
|
|
|
|
## 4. Multi-tenancy isolation
|
|
Every organization must be isolated by design.
|
|
|
|
Namespace by organization for:
|
|
- resources
|
|
- users-in-org
|
|
- groups
|
|
- policies
|
|
- connectors
|
|
- sessions
|
|
- audit
|
|
- secrets references
|
|
- Redis keys where applicable
|
|
|
|
No cross-org leakage of identifiers, data, logs, cache keys, or policy decisions.
|
|
|
|
## 5. Customer-managed nodes
|
|
Customer-managed nodes:
|
|
- may join the common cluster,
|
|
- must remain limited to allowed scope,
|
|
- must not automatically become general-purpose relay/control nodes for other organizations.
|
|
|
|
## 6. Node agent design
|
|
A node agent:
|
|
- is small,
|
|
- stable,
|
|
- always running,
|
|
- supervises services,
|
|
- downloads signed updates,
|
|
- verifies signatures and versions,
|
|
- can rollback,
|
|
- can restart services,
|
|
- can operate on thin nodes and thick nodes.
|
|
|
|
The agent is not the same as the service workloads.
|
|
|
|
## 7. Split-brain prevention
|
|
Never allow minority partitions to become a second authoritative cluster automatically.
|
|
|
|
Required states:
|
|
- healthy
|
|
- degraded
|
|
- recovery
|
|
- isolated / emergency
|
|
|
|
Cluster-wide changes, role changes and risky mutations must be restricted in non-quorum states.
|
|
|
|
## 8. Service model
|
|
Each node must separate:
|
|
- capabilities
|
|
- enabled services
|
|
|
|
Do not encode every function into one monolithic node role.
|
|
|
|
## 9. Security model
|
|
Security must be based on:
|
|
- strong crypto
|
|
- signed artifacts
|
|
- node identity
|
|
- short-lived user/session tokens
|
|
- scoped trust
|
|
- audit trails
|
|
- revocation
|
|
- least privilege
|
|
|
|
Do not depend on protocol obscurity.
|
|
|
|
## 10. Migration strategy
|
|
Do not force a big-bang rewrite.
|
|
Add the platform core around the current system in steps:
|
|
1. organization / membership model
|
|
2. org-scoped resource model
|
|
3. node model and node-agent control interfaces
|
|
4. connector model
|
|
5. mesh / routing evolution
|
|
6. native clients and higher-level features
|
|
|
|
## 11. Updates and rollback
|
|
Updates must support:
|
|
- manual or automatic policy
|
|
- staged rollout
|
|
- canary rollout
|
|
- rollback to previous version
|
|
- signed artifacts
|
|
- optional update mirrors / caches on selected nodes
|
|
|
|
Thin nodes may download but not store update artifacts.
|
|
|
|
## 12. Performance and routing awareness
|
|
Placement and routing decisions must consider:
|
|
- CPU
|
|
- RAM
|
|
- network load
|
|
- active sessions
|
|
- connector load
|
|
- relay load
|
|
- service type
|
|
- health score
|
|
|
|
## 13. No feature explosion before platform core
|
|
Do not jump to:
|
|
- full collaboration/video meetings
|
|
- advanced media plane
|
|
- internet exit mode
|
|
before the platform core is modeled correctly.
|