# Fabric Routing Engine Skeleton Status: Stage C15 result. Documentation and architecture only. This document defines the Fabric Routing Engine skeleton boundary. It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP tunnel runtime, relay packet routing, RDP work, or service workload execution. ## 1. Purpose The Fabric Routing Engine is the logical Fabric layer responsible for choosing authorized paths between ingress, core, egress, service, storage, and future VPN/IP-tunnel components. C15 defines the route decision boundary before runtime mesh routing exists. The purpose is to ensure that future routing: - is policy-aware - is QoS-aware - is channel-aware - respects cluster and organization boundaries - uses scoped local state and peer cache - does not depend on live backend availability for realtime decisions - is not implemented independently by Service Adapters ## 2. Non-Goals C15 does not: - carry production mesh traffic - implement node-to-node transport - implement relay forwarding - implement VPN/IP tunnel packets - implement QUIC/WebRTC - implement route execution - implement service workloads - change RDP runtime - change backend session lifecycle - change Windows client behavior It defines contracts and responsibilities only. ## 3. Routing Engine Responsibilities The Fabric Routing Engine owns: - route request validation - peer candidate filtering - route scoring - channel-aware path selection - QoS class selection - route cache lookup/update policy - failover decision boundaries - shortcut recommendation boundaries - topology hiding - policy and cluster-boundary enforcement - service adapter routing integration boundary The Routing Engine does not own: - PostgreSQL source-of-truth mutation - service protocol translation - RDP/VNC/SSH/VPN implementation details - raw packet forwarding - direct secret resolution - organization admin visibility - node enrollment authority ## 4. Inputs Routing decisions may consume: - signed scoped cluster snapshot - node-local peer cache - route cache - peer directory - route policy - QoS policy - service assignment cache - cluster membership - organization scope - service/resource scope - channel class - current health/degraded state - partition/authority state - failure history - load and latency observations Routing decisions must not require a live backend call in the realtime path. ## 5. Route Request Contract A route request is a logical request for a path. It is not a packet. Required fields: - `request_id` - `cluster_id` - `organization_id` where applicable - `source_node_id` - `source_role` - `destination_kind` - `destination_ref` - `service_type` - `channel_class` - `priority_class` - `policy_refs` - `requested_at` Destination kinds: - `node` - `egress_pool` - `service_instance` - `resource_target` - `vpn_connection` - `storage_scope` - `control_plane_endpoint` Optional fields: - `session_id` - `attachment_id` - `resource_id` - `user_id` - `device_id` - `region_preference` - `required_capabilities` - `forbidden_nodes` - `preferred_nodes` - `max_latency_ms` - `min_bandwidth_hint` - `stickiness_key` - `previous_route_id` - `failure_context` Service adapters may create route requests through an adapter-facing boundary, but they must not select peers or paths themselves. ## 6. Route Result Contract A route result is a signed or locally verifiable decision artifact for a bounded time. Required fields: - `route_id` - `request_id` - `cluster_id` - `organization_id` where applicable - `route_class` - `channel_class` - `selected_path` - `selected_qos_class` - `score` - `valid_from` - `expires_at` - `route_epoch` - `policy_version` - `decision_reason` Selected path contains ordered logical hops: - source node - optional ingress node - zero or more core/relay nodes - optional egress/service node - target/service endpoint Optional fields: - `fallback_paths` - `shortcut_candidate` - `stickiness_key` - `drain_after` - `degraded_mode` - `constraints_applied` - `rejection_reason` Route results must be bounded by expiry, policy version, route epoch, and cluster authority state. ## 7. Channel Classes Routing is channel-aware. Initial channel classes: - `control` - `input` - `render` - `cursor` - `clipboard` - `file_transfer` - `telemetry` - `vpn_packet` - `storage_fetch` - `update_fetch` Rules: - `input` and critical `control` prefer lowest latency and lowest jitter. - `render` prefers bandwidth and bounded jitter; stale render may be dropped. - `cursor` is latest-only and should use low-latency paths. - `clipboard` is reliable and bounded. - `file_transfer` prefers throughput but must not starve input/control/render. - `telemetry` is low priority and may be sampled or dropped. - `vpn_packet` uses adaptive QoS and bulk protection. - `storage_fetch` and `update_fetch` should not consume interactive reserves. ## 8. Route Classes Initial route classes: - `direct` - `single_relay` - `multi_hop` - `storage_local` - `storage_remote` - `vpn_chained` - `degraded_existing` - `unavailable` `direct`: - selected when source can safely reach destination directly - trust and policy must allow it `single_relay`: - selected when one relay improves connectivity or policy requires relay `multi_hop`: - selected when direct/single relay is unavailable or policy/region requires it `storage_local` / `storage_remote`: - used for config/snapshot/artifact fetch decisions `vpn_chained`: - used when a managed service or IP tunnel depends on a logical `vpn_connection` `degraded_existing`: - keeps an already-authorized existing path alive while policy permits `unavailable`: - explicit denial or no valid route ## 9. Hard Policy Checks Hard checks run before scoring. Reject route when: - source node is not trusted - source node is not a member of the cluster - destination is outside cluster scope - cross-cluster trust is missing - organization scope does not match - role assignment does not permit the route - peer certificate is invalid or revoked - required channel is not authorized - partition/authority state forbids new route - destination node is draining or disabled and policy forbids placement - route would leak topology or tenant data No score can override hard policy rejection. ## 10. Scoring Inputs Soft scoring inputs: - latency - jitter - packet loss - reliability - recent failure history - region distance - load - available bandwidth - role suitability - route length - service co-location - stickiness preference - cost preference - policy preference - health score Scoring weights are policy-driven and may differ by channel class. Example: - input/control heavily weight latency and jitter - file transfer heavily weights throughput and reliability - VPN bulk considers QoS impact on interactive routes - storage fetch considers locality and replica freshness ## 11. Route Cache Relationship Route cache is local and bounded. Cache key inputs: - cluster id - organization id - source node - destination kind/ref - service type - channel class - policy version - route epoch - stickiness key Cache entries contain: - route result - expiry - score - last success/failure - backoff state - fallback candidates Cache invalidation triggers: - policy version change - peer directory version change - trust/revocation update - route epoch change - health state change - repeated route failure - expiry Route cache is a performance aid, not route authority. ## 12. Failover Boundaries Failover decisions may: - switch from failed active path to fallback path - promote warm peer path - retry through bootstrap route for recovery - mark route unavailable - request control-plane/config refresh when reachable - keep degraded existing path alive if policy permits Failover decisions must not: - create new cluster authority - bypass policy - add nodes - approve role changes - cross cluster boundaries without explicit trust - expose topology to organizations ## 13. Shortcut Decision Boundary Shortcut connections are optional optimization recommendations. A shortcut may be recommended when: - long-lived flow exists - current path latency/jitter is high - direct connectivity appears possible - trust validation succeeds - policy allows shortcut - shortcut improves latency, jitter, or bandwidth - fallback path remains available Shortcut recommendation output: - source node - destination node - channel classes affected - expected improvement - required validation - expiry - fallback route id C15 does not implement shortcut connections. It only defines when a future Routing Engine may recommend them. ## 14. Service Adapter Integration Service Adapters may ask for routes using service-neutral metadata. Examples: - RDP Adapter requests route to RDP service/egress node or resource target. - VNC Adapter requests route to VNC target zone. - SSH Adapter requests route to SSH target. - VPN/IP tunnel service requests route through `vpn_connection`. - Storage fetch requests route to config/storage scope. Service Adapters must not: - enumerate peers - select mesh paths - create relay chains - create shortcuts - implement failover policy - implement partition recovery - implement cross-cluster routing trust The adapter consumes a route result and sends/receives through the approved data-plane boundary when runtime exists. ## 15. Topology Hiding Organizations see: - allowed service endpoints - safe ingress/egress status - safe session/resource status - policy-visible route dependency names where allowed Organizations must not see: - intermediate core mesh nodes - full peer directory - route cache - shortcut candidates - other organizations' route data - storage shard placement Platform owners may inspect routing internals according to audited platform policy. ## 16. Degraded and Partition Behavior In degraded mode, Routing Engine may: - keep existing authorized routes alive until TTL - use last signed snapshot for recovery - select fallback among already-authorized peers - mark route unavailable when safety cannot be proven In degraded mode, Routing Engine must not: - authorize new high-risk routes - mutate cluster trust - approve nodes - assign roles - promote partition authority automatically - create cross-cluster trust ## 17. Observability Routing decisions should emit safe telemetry: - route selected - route rejected - rejection reason - route class - channel class - score bucket - latency/jitter/packet loss summary - failover count - fallback used - shortcut recommended - policy version - peer directory version - route epoch Tenant-visible telemetry must hide topology. ## 18. Future Validation Tests Future implementation tests must prove: - route request rejects wrong cluster - route request rejects wrong organization - revoked peer is not selected - unavailable route returns explicit result - cache invalidates on policy version change - cache invalidates on peer directory version change - input route prefers latency over throughput - file transfer route does not starve input class - service adapter cannot bypass routing engine - shortcut recommendation requires fallback path - degraded mode does not authorize new forbidden routes ## 19. C16 Preparation C16 must define the secure node-to-node channel lifecycle that can later carry route-selected traffic. C16 must preserve: - routing results are bounded and policy-scoped - channels are authenticated and authorized - trust/revocation affects active channels - Service Adapters remain above Fabric routing - no mesh packet routing starts before explicit C17 ## 20. Result / Decision Stage C15 defines Fabric Routing Engine as a skeleton boundary for route requests, route results, scoring, cache relationship, failover, shortcut recommendations, topology hiding, and Service Adapter integration. Decisions: - Routing belongs to Fabric, not Service Adapters. - Route requests/results are logical contracts, not packet forwarding. - Hard policy checks precede scoring. - Route cache is local, bounded, and non-authoritative. - Routing is channel-aware and QoS-aware. - Shortcut connections are future optional recommendations, not C15 runtime. - C16 must define secure node-to-node channels before mesh routing runtime. No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service workload behavior is changed by C15.