rdp-proxy/docs/codex/ARCHITECTURE_GUARDRAILS.md

# Architecture guardrails

These rules are mandatory.

## 1. Preserve the proven session foundation
The following are already proven and must remain stable:
- live FreeRDP connect
- active session state
- terminate
- detach without killing remote session
- reattach without recreating remote session
- takeover without recreating remote session

No architectural refactor may silently weaken this behavior.

## 2. Source of truth
- PostgreSQL is the only durable source of truth for domain state.
- Redis is only for live coordination, routing, heartbeats, leases, attach tokens, and ephemeral cache.

## 3. Control plane vs data plane
Keep them distinct.

### Control plane
- organizations
- users
- memberships
- roles
- resources
- policies
- nodes
- services
- connectors
- cluster membership
- updates
- config distribution

### Data plane
- session streams
- worker traffic
- relay traffic
- connector traffic
- future exit traffic

## 4. Multi-tenancy isolation
Every organization must be isolated by design.

Namespace by organization for:
- resources
- users-in-org
- groups
- policies
- connectors
- sessions
- audit
- secrets references
- Redis keys where applicable

No cross-org leakage of identifiers, data, logs, cache keys, or policy decisions.

## 5. Customer-managed nodes
Customer-managed nodes:
- may join the common cluster,
- must remain limited to allowed scope,
- must not automatically become general-purpose relay/control nodes for other organizations.

## 6. Node agent design
A node agent:
- is small,
- stable,
- always running,
- supervises services,
- downloads signed updates,
- verifies signatures and versions,
- can rollback,
- can restart services,
- can operate on thin nodes and thick nodes.

The agent is not the same as the service workloads.

## 7. Split-brain prevention
Never allow minority partitions to become a second authoritative cluster automatically.

Required states:
- healthy
- degraded
- recovery
- isolated / emergency

Cluster-wide changes, role changes and risky mutations must be restricted in non-quorum states.

## 8. Service model
Each node must separate:
- capabilities
- enabled services

Do not encode every function into one monolithic node role.

## 9. Security model
Security must be based on:
- strong crypto
- signed artifacts
- node identity
- short-lived user/session tokens
- scoped trust
- audit trails
- revocation
- least privilege

Do not depend on protocol obscurity.

## 10. Migration strategy
Do not force a big-bang rewrite.
Add the platform core around the current system in steps:
1. organization / membership model
2. org-scoped resource model
3. node model and node-agent control interfaces
4. connector model
5. mesh / routing evolution
6. native clients and higher-level features

## 11. Updates and rollback
Updates must support:
- manual or automatic policy
- staged rollout
- canary rollout
- rollback to previous version
- signed artifacts
- optional update mirrors / caches on selected nodes

Thin nodes may download but not store update artifacts.

## 12. Performance and routing awareness
Placement and routing decisions must consider:
- CPU
- RAM
- network load
- active sessions
- connector load
- relay load
- service type
- health score

## 13. No feature explosion before platform core
Do not jump to:
- full collaboration/video meetings
- advanced media plane
- internet exit mode
before the platform core is modeled correctly.