m/rdp-proxy

Fork 0

Files

T

m 8ba0561f4f Initial project snapshot

2026-04-28 22:29:50 +03:00

9.5 KiB

Raw Blame History

Fabric Peer Directory and Cache Model

Status: Stage C14 result. Documentation and architecture only.

This document defines the Fabric peer directory and node-local peer cache model. It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP tunnel runtime, relay packet routing, RDP work, or service workload execution.

1. Purpose

The peer directory tells a node which peers it may know about and potentially connect to. The node-local peer cache stores scoped peer data plus runtime observations for fast recovery and score-based peer selection.

The model must avoid:

full-mesh assumptions
every node knowing full cluster topology
service adapters owning route selection
Redis as durable peer topology
backend calls on every realtime route decision

2. Peer Knowledge Classes

Each node maintains three peer classes:

active peers
warm candidate peers
cold/bootstrap peers

Active peers:

currently connected or recently used
participate in health, route, relay, or service traffic according to role
small bounded set

Warm candidate peers:

known good but not currently active
promoted when active peers fail or a better path is needed
refreshed less frequently than active peers

Cold/bootstrap peers:

seed or last-resort discovery peers
used when active and warm peers fail
may come from signed snapshot, local cache, storage/config service, or admin-defined seed nodes

Recommended active peer counts:

normal node: 3-5
relay/core node: 8-20
thin/mobile node: 1-3

These are policy defaults, not hardcoded limits.

3. Peer Directory Record

A signed peer directory entry may contain:

node_id
cluster_id
endpoint candidates
advertised roles
verified capabilities
allowed peer relationship type
region/location hints
trust/certificate fingerprint
certificate expiry metadata
policy scope
organization scope where applicable
service scope where applicable
supported transport hints
NAT/connectivity hints
last_seen_config_version

The peer directory is scoped. Ordinary nodes must not receive a full cluster peer directory unless their role explicitly requires it.

4. Endpoint Candidate Model

Endpoint candidates describe possible ways to reach a node.

Candidate fields:

endpoint id
transport type
host/IP/DNS name
port
address family
public/private reachability
region
NAT type if known
TLS/mTLS identity expectations
priority
policy tags
last verified timestamp

Transport types may include future values such as:

direct TCP/TLS
WSS
relay-assisted
outbound-only reverse channel
future QUIC/UDP where explicitly approved

This model is descriptive only. C14 does not implement new transports.

5. Node-Local Peer Cache

The node-local peer cache contains signed directory data plus runtime observations.

Directory-derived fields:

peer identity
cluster id
endpoint candidates
roles/capabilities
trust fingerprint
policy scope
config version

Runtime observation fields:

last_success_at
last_failure_at
last_latency_ms
packet loss
jitter
reliability score
recent failure history
observed load hint where allowed
active/warm/cold state
last selected route id if applicable

Runtime observations are hints. They are not durable authority.

6. Refresh Cadence

Recommended cadence:

active peer heartbeat: 5-15 seconds
active/warm latency probes: 30-120 seconds
warm peer validation: 2-10 minutes
peer directory refresh: 5-15 minutes
cold/bootstrap validation: periodic or on demand
full peer directory resync: only on version gap, signature mismatch, or policy-triggered refresh

Cadence may vary by role:

relay/core nodes maintain richer peer sets
thin/mobile nodes probe less aggressively
egress/service nodes prioritize peers relevant to assigned services
storage/config nodes prioritize configured replica peers

7. Peer Selection Scoring

Selection is score-based, not latency-only.

Hard checks first:

cluster membership
node identity trust
certificate validity
role compatibility
allowed peer relationship
organization/service scope
partition/authority policy
transport compatibility
revocation status

Soft score inputs:

latency
packet loss
jitter
reliability
recent failure history
region distance
node load hint
bandwidth availability
role suitability
route class/channel class
policy preference

No peer should be selected if it fails hard policy checks, even if latency is excellent.

8. Recovery Order

If active peers fail, recovery order is:

retry active peers with bounded backoff
promote warm candidates
try cold/bootstrap peers
query authorized storage/config discovery endpoint
use last signed snapshot for degraded reconnect if policy allows
reconnect to control plane when available

Recovery must not authorize cluster mutation or high-risk actions.

9. Channel-Aware Peer Preference

Peer choice depends on channel class.

Input/control:

lowest latency
lowest jitter
high reliability
never behind bulk traffic

Render/video:

bandwidth and jitter aware
stale-frame dropping acceptable
avoid paths with persistent queue growth

File transfer:

throughput and reliability
lower priority than input/control

Clipboard/control:

reliable bounded path
low volume

Telemetry:

low priority
lossy/sampled allowed

VPN/IP tunnel future:

adaptive QoS
bulk traffic must not starve interactive sessions

10. Full-Mesh Prevention

Nodes must not attempt to connect to every known node.

Limits:

active peers are bounded by role policy
warm peers are bounded by role policy
peer directory is scoped
full topology is hidden from organizations
service adapters never request arbitrary topology

Full topology access is reserved only for roles that require it, such as platform control/admin views or selected core/route-analysis components.

11. Security Boundaries

Peer cache must enforce:

cluster isolation
organization isolation
certificate fingerprint validation
revocation status
role assignment
allowed peer relationship
service scope

A compromised ordinary node should not learn full cluster topology.

Peer cache data must not include:

unrelated organization resources
raw secrets
broad user lists
arbitrary route authority
cross-cluster trust unless explicitly authorized

12. Multi-Cluster Peer Isolation

Multi-cluster node membership uses separate peer caches per cluster.

Per-cluster separation:

peer directory
endpoint candidates
trust roots
certificate fingerprints
active/warm/cold peer state
route observations
failure history

Cross-cluster peer discovery requires explicit trust and policy. Clusters do not form a single mesh by default.

13. Storage / Snapshot Relationship

Peer directory data is distributed through signed snapshots or Fabric Storage / Config Storage artifacts.

Rules:

peer directory version is tracked
node reports last applied peer directory version
version gap triggers refresh/full resync
signature/hash mismatch rejects the directory
revoked peers are removed or marked unusable
runtime observations are preserved only when still valid for the current directory version

14. Service Adapter Boundary

Service Adapters may request:

destination node
resource target
egress node
egress pool
channel class

Service Adapters must not:

enumerate peers
select mesh routes
promote warm peers
create shortcut connections
implement partition recovery
implement cross-cluster routing policy

The Fabric Routing Engine owns those decisions.

15. Observability

Node-agent should report safe peer/cache metrics:

active peer count
warm peer count
bootstrap peer count
peer directory version
last refresh time
average active peer latency
packet loss summary
failed peer count
recovery mode if active
selected peer class by channel type

Reports must not expose full topology to organizations.

16. Future Validation Tests

Future implementation tests must prove:

peer directory scope is enforced
wrong-cluster peer is rejected
revoked peer is rejected
invalid certificate fingerprint is rejected
full topology is not distributed to ordinary node
active peer count stays bounded
warm peer promotion works
bootstrap recovery works
score-based selection respects hard policy checks
stale runtime observations are ignored after directory version change
service adapter cannot bypass Fabric peer selection

17. C15 Preparation

C15 must define the Fabric Routing Engine skeleton boundary.

The routing engine will consume:

peer directory/cache
route policy
QoS policy
channel class
service request metadata
cluster/organization scope
failure history

C15 must not carry production mesh traffic. It should define route request and route result boundaries before runtime routing exists.

18. Result / Decision

Stage C14 defines scoped peer discovery and peer cache behavior.

Decisions:

nodes maintain active, warm, and cold/bootstrap peer classes
nodes do not maintain full mesh connections
peer directory data is scoped and signed
peer cache combines signed directory data with runtime observations
peer selection is score-based with hard policy checks first
recovery uses active, warm, bootstrap, storage/config, then last snapshot
service adapters do not own peer discovery or route selection
C15 must define the Fabric Routing Engine skeleton before mesh runtime

No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service workload behavior is changed by C14.

9.5 KiB Raw Blame History