12 KiB
Fabric Routing Engine Skeleton
Status: Stage C15 result. Documentation and architecture only.
This document defines the Fabric Routing Engine skeleton boundary. It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP tunnel runtime, relay packet routing, RDP work, or service workload execution.
1. Purpose
The Fabric Routing Engine is the logical Fabric layer responsible for choosing authorized paths between ingress, core, egress, service, storage, and future VPN/IP-tunnel components.
C15 defines the route decision boundary before runtime mesh routing exists.
The purpose is to ensure that future routing:
- is policy-aware
- is QoS-aware
- is channel-aware
- respects cluster and organization boundaries
- uses scoped local state and peer cache
- does not depend on live backend availability for realtime decisions
- is not implemented independently by Service Adapters
2. Non-Goals
C15 does not:
- carry production mesh traffic
- implement node-to-node transport
- implement relay forwarding
- implement VPN/IP tunnel packets
- implement QUIC/WebRTC
- implement route execution
- implement service workloads
- change RDP runtime
- change backend session lifecycle
- change Windows client behavior
It defines contracts and responsibilities only.
3. Routing Engine Responsibilities
The Fabric Routing Engine owns:
- route request validation
- peer candidate filtering
- route scoring
- channel-aware path selection
- QoS class selection
- route cache lookup/update policy
- failover decision boundaries
- shortcut recommendation boundaries
- topology hiding
- policy and cluster-boundary enforcement
- service adapter routing integration boundary
The Routing Engine does not own:
- PostgreSQL source-of-truth mutation
- service protocol translation
- RDP/VNC/SSH/VPN implementation details
- raw packet forwarding
- direct secret resolution
- organization admin visibility
- node enrollment authority
4. Inputs
Routing decisions may consume:
- signed scoped cluster snapshot
- node-local peer cache
- route cache
- peer directory
- route policy
- QoS policy
- service assignment cache
- cluster membership
- organization scope
- service/resource scope
- channel class
- current health/degraded state
- partition/authority state
- failure history
- load and latency observations
Routing decisions must not require a live backend call in the realtime path.
5. Route Request Contract
A route request is a logical request for a path. It is not a packet.
Required fields:
request_idcluster_idorganization_idwhere applicablesource_node_idsource_roledestination_kinddestination_refservice_typechannel_classpriority_classpolicy_refsrequested_at
Destination kinds:
nodeegress_poolservice_instanceresource_targetvpn_connectionstorage_scopecontrol_plane_endpoint
Optional fields:
session_idattachment_idresource_iduser_iddevice_idregion_preferencerequired_capabilitiesforbidden_nodespreferred_nodesmax_latency_msmin_bandwidth_hintstickiness_keyprevious_route_idfailure_context
Service adapters may create route requests through an adapter-facing boundary, but they must not select peers or paths themselves.
6. Route Result Contract
A route result is a signed or locally verifiable decision artifact for a bounded time.
Required fields:
route_idrequest_idcluster_idorganization_idwhere applicableroute_classchannel_classselected_pathselected_qos_classscorevalid_fromexpires_atroute_epochpolicy_versiondecision_reason
Selected path contains ordered logical hops:
- source node
- optional ingress node
- zero or more core/relay nodes
- optional egress/service node
- target/service endpoint
Optional fields:
fallback_pathsshortcut_candidatestickiness_keydrain_afterdegraded_modeconstraints_appliedrejection_reason
Route results must be bounded by expiry, policy version, route epoch, and cluster authority state.
7. Channel Classes
Routing is channel-aware.
Initial channel classes:
controlinputrendercursorclipboardfile_transfertelemetryvpn_packetstorage_fetchupdate_fetch
Rules:
inputand criticalcontrolprefer lowest latency and lowest jitter.renderprefers bandwidth and bounded jitter; stale render may be dropped.cursoris latest-only and should use low-latency paths.clipboardis reliable and bounded.file_transferprefers throughput but must not starve input/control/render.telemetryis low priority and may be sampled or dropped.vpn_packetuses adaptive QoS and bulk protection.storage_fetchandupdate_fetchshould not consume interactive reserves.
8. Route Classes
Initial route classes:
directsingle_relaymulti_hopstorage_localstorage_remotevpn_chaineddegraded_existingunavailable
direct:
- selected when source can safely reach destination directly
- trust and policy must allow it
single_relay:
- selected when one relay improves connectivity or policy requires relay
multi_hop:
- selected when direct/single relay is unavailable or policy/region requires it
storage_local / storage_remote:
- used for config/snapshot/artifact fetch decisions
vpn_chained:
- used when a managed service or IP tunnel depends on a logical
vpn_connection
degraded_existing:
- keeps an already-authorized existing path alive while policy permits
unavailable:
- explicit denial or no valid route
9. Hard Policy Checks
Hard checks run before scoring.
Reject route when:
- source node is not trusted
- source node is not a member of the cluster
- destination is outside cluster scope
- cross-cluster trust is missing
- organization scope does not match
- role assignment does not permit the route
- peer certificate is invalid or revoked
- required channel is not authorized
- partition/authority state forbids new route
- destination node is draining or disabled and policy forbids placement
- route would leak topology or tenant data
No score can override hard policy rejection.
10. Scoring Inputs
Soft scoring inputs:
- latency
- jitter
- packet loss
- reliability
- recent failure history
- region distance
- load
- available bandwidth
- role suitability
- route length
- service co-location
- stickiness preference
- cost preference
- policy preference
- health score
Scoring weights are policy-driven and may differ by channel class.
Example:
- input/control heavily weight latency and jitter
- file transfer heavily weights throughput and reliability
- VPN bulk considers QoS impact on interactive routes
- storage fetch considers locality and replica freshness
11. Route Cache Relationship
Route cache is local and bounded.
Cache key inputs:
- cluster id
- organization id
- source node
- destination kind/ref
- service type
- channel class
- policy version
- route epoch
- stickiness key
Cache entries contain:
- route result
- expiry
- score
- last success/failure
- backoff state
- fallback candidates
Cache invalidation triggers:
- policy version change
- peer directory version change
- trust/revocation update
- route epoch change
- health state change
- repeated route failure
- expiry
Route cache is a performance aid, not route authority.
12. Failover Boundaries
Failover decisions may:
- switch from failed active path to fallback path
- promote warm peer path
- retry through bootstrap route for recovery
- mark route unavailable
- request control-plane/config refresh when reachable
- keep degraded existing path alive if policy permits
Failover decisions must not:
- create new cluster authority
- bypass policy
- add nodes
- approve role changes
- cross cluster boundaries without explicit trust
- expose topology to organizations
13. Shortcut Decision Boundary
Shortcut connections are optional optimization recommendations.
A shortcut may be recommended when:
- long-lived flow exists
- current path latency/jitter is high
- direct connectivity appears possible
- trust validation succeeds
- policy allows shortcut
- shortcut improves latency, jitter, or bandwidth
- fallback path remains available
Shortcut recommendation output:
- source node
- destination node
- channel classes affected
- expected improvement
- required validation
- expiry
- fallback route id
C15 does not implement shortcut connections. It only defines when a future Routing Engine may recommend them.
14. Service Adapter Integration
Service Adapters may ask for routes using service-neutral metadata.
Examples:
- RDP Adapter requests route to RDP service/egress node or resource target.
- VNC Adapter requests route to VNC target zone.
- SSH Adapter requests route to SSH target.
- VPN/IP tunnel service requests route through
vpn_connection. - Storage fetch requests route to config/storage scope.
Service Adapters must not:
- enumerate peers
- select mesh paths
- create relay chains
- create shortcuts
- implement failover policy
- implement partition recovery
- implement cross-cluster routing trust
The adapter consumes a route result and sends/receives through the approved data-plane boundary when runtime exists.
15. Topology Hiding
Organizations see:
- allowed service endpoints
- safe ingress/egress status
- safe session/resource status
- policy-visible route dependency names where allowed
Organizations must not see:
- intermediate core mesh nodes
- full peer directory
- route cache
- shortcut candidates
- other organizations' route data
- storage shard placement
Platform owners may inspect routing internals according to audited platform policy.
16. Degraded and Partition Behavior
In degraded mode, Routing Engine may:
- keep existing authorized routes alive until TTL
- use last signed snapshot for recovery
- select fallback among already-authorized peers
- mark route unavailable when safety cannot be proven
In degraded mode, Routing Engine must not:
- authorize new high-risk routes
- mutate cluster trust
- approve nodes
- assign roles
- promote partition authority automatically
- create cross-cluster trust
17. Observability
Routing decisions should emit safe telemetry:
- route selected
- route rejected
- rejection reason
- route class
- channel class
- score bucket
- latency/jitter/packet loss summary
- failover count
- fallback used
- shortcut recommended
- policy version
- peer directory version
- route epoch
Tenant-visible telemetry must hide topology.
18. Future Validation Tests
Future implementation tests must prove:
- route request rejects wrong cluster
- route request rejects wrong organization
- revoked peer is not selected
- unavailable route returns explicit result
- cache invalidates on policy version change
- cache invalidates on peer directory version change
- input route prefers latency over throughput
- file transfer route does not starve input class
- service adapter cannot bypass routing engine
- shortcut recommendation requires fallback path
- degraded mode does not authorize new forbidden routes
19. C16 Preparation
C16 must define the secure node-to-node channel lifecycle that can later carry route-selected traffic.
C16 must preserve:
- routing results are bounded and policy-scoped
- channels are authenticated and authorized
- trust/revocation affects active channels
- Service Adapters remain above Fabric routing
- no mesh packet routing starts before explicit C17
20. Result / Decision
Stage C15 defines Fabric Routing Engine as a skeleton boundary for route requests, route results, scoring, cache relationship, failover, shortcut recommendations, topology hiding, and Service Adapter integration.
Decisions:
- Routing belongs to Fabric, not Service Adapters.
- Route requests/results are logical contracts, not packet forwarding.
- Hard policy checks precede scoring.
- Route cache is local, bounded, and non-authoritative.
- Routing is channel-aware and QoS-aware.
- Shortcut connections are future optional recommendations, not C15 runtime.
- C16 must define secure node-to-node channels before mesh routing runtime.
No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service workload behavior is changed by C15.