# Distributed Fabric Node Protocol Plan This document fixes the target direction for the Secure Access Fabric after the VPN performance investigation. The platform must not be treated as a VPN server, RDP gateway, or web console. It is a distributed overlay transport where every participating device is a fabric node, and VPN/RDP/HTTP/admin/storage are services running over that fabric. ## Core Position Every device is a node. A phone, home server, cloud server, relay, admin-console host, storage host, and update-cache host share the same base identity model. They differ by roles, capabilities, policy, trust level, and current health. ```text Node = identity + roles + capabilities + policy + health + local state ``` The Android VPN app is therefore not only a client. It is a mobile fabric node. It may carry VPN traffic, participate in route discovery, relay traffic when policy allows, host limited control/storage roles when approved, and report mobile-specific capacity signals such as battery, network type, NAT behavior, foreground/background state, and metered network policy. ## What Was Missing The current implementation proves route leases and production VPN forwarding, but it still has a data-plane shape that cannot scale to high throughput: - too much payload traffic is carried as small request/response HTTP forwarding calls; - JSON/base64 payload envelopes add overhead and CPU cost; - one overloaded stream can delay unrelated traffic; - route health is visible, but the transport does not yet provide enough low-latency per-stream feedback; - the phone behaves mostly as a service client, not as a full fabric node; - service discovery and route execution are not yet separated cleanly enough; - fallback paths can keep traffic alive, but can also hide architecture bottlenecks if used as the primary data plane. For 100 Mbps per active device and future 1000+ or millions of devices, the fabric must move to a persistent, binary, multiplexed data plane with explicit route and stream semantics. ## Non-Negotiable Principles 1. Fabric is the lower transport layer. VPN, RDP, HTTP, admin console, storage, and update delivery are services above it. 2. Service adapters must not discover topology, own route selection, or invent failover logic. They request transport from the fabric. 3. Control plane and data plane are separate. API/console traffic must not be the packet transport mechanism. 4. Every data session carries many independent streams. A blocked bulk download must not stall RDP, DNS, control, or telemetry. 5. Routes are leased and replaceable. Route selection uses quality, policy, locality, role eligibility, cost, trust, and current load. 6. The fabric is distributed. Central control can coordinate, but the runtime must keep working through cached policy, peer directories, route leases, and local health when central components are degraded. 7. Mobile nodes are first-class nodes with stricter capability scoring. 8. HTTP forwarding remains a compatibility and emergency fallback, not the primary high-speed data plane. ## Node Roles Initial role vocabulary: - `mobile-edge`: mobile Android/iOS fabric node. - `entry`: accepts external sessions. - `relay`: forwards fabric traffic between nodes. - `exit`: terminates routes into a target network or service zone. - `service-host`: runs service adapters such as admin console, VPN exit, RDP, HTTP ingress, storage, or update-cache. - `control-plane`: participates in control authority, policy decisions, route authority, or quorum work. - `route-coordinator`: calculates or assists route candidates for a partition, region, or service class. - `storage`: stores approved replicated fabric state. - `observer`: collects telemetry and health without carrying user traffic. - `update-cache`: mirrors signed artifacts close to nodes. Roles are policy decisions, not binary builds. A phone can theoretically receive any role, but scheduler scoring must account for battery, OS restrictions, NAT, uplink stability, foreground state, and user cost policy. ## Capability Model Nodes must advertise capability facts in heartbeats and peer updates: - supported fabric protocol versions; - supported transports: UDP/QUIC, TCP, WebSocket, HTTPS fallback; - NAT type and reachability; - measured RTT/loss/jitter/bandwidth to peers and entry candidates; - CPU, memory, queue depth, file descriptor/socket pressure; - battery state, charging state, mobile/wifi network type, metered policy; - max relay bandwidth and allowed traffic classes; - service roles and service capacity; - trust tier and allowed tenant/organization scopes; - local policy version, peer directory version, route cache version. ## Fabric Data Session V1 The first practical protocol step is a persistent binary data session. It may initially run over WebSocket/TCP for faster delivery, but the framing must be transport-neutral so the same protocol can move to QUIC/UDP. Minimum frame set: ```text HELLO node identity, protocol version, capabilities AUTH signed session token or mTLS-bound proof SESSION_READY accepted limits, route epoch, peer epoch OPEN_STREAM stream id, service id, traffic class, route id DATA stream id, sequence, flags, payload ACK stream id, received sequence/window PING/PONG RTT and liveness ROUTE_UPDATE new route lease or alternate route set STREAM_CREDIT per-stream backpressure window NODE_PRESSURE queue/cpu/memory/network pressure signal CLOSE_STREAM normal stream close RESET_STREAM failed stream, other streams remain alive GOAWAY draining or protocol shutdown ``` Traffic classes: - `control`: authorization, route updates, attach/detach, liveness. - `dns`: small, latency-sensitive name resolution. - `interactive`: RDP input, SSH interactive, UI control. - `reliable`: normal web/API traffic. - `bulk`: downloads, uploads, sync, large media. - `droppable`: telemetry samples, optional probes, low-value background data. Each stream has independent flow control and backpressure. Bulk can be slowed or moved to another route without blocking control or interactive streams. ## Route Model The fabric must maintain multiple candidate routes for an active session: ```text phone-a -> entry-1 -> home-1 phone-a -> phone-b -> relay-2 -> home-1 phone-a -> entry-2 -> relay-4 -> service-host-7 ``` Route scoring inputs: - policy and role eligibility; - route length and failure domains; - RTT, jitter, packet loss, bandwidth estimate; - queue depth and retransmit pressure; - current node CPU/memory/socket pressure; - mobile battery/charging/metered status; - historical reliability; - service locality; - tenant/organization isolation; - cost and operator preference. Routes are issued as short leases with route id, epoch, allowed channels, allowed service classes, hop list or next-hop policy, expiry, and fencing rules. ## Service Discovery Services are logical names, not fixed hosts: ```text service: admin-console replicas: home-1, node-2, node-9 policy: active-active or leader/follower ingress: vpn.cin.su / admin.cin.su / internal name ``` `vpn.cin.su` as an HTTP/HTTPS entry is a service endpoint. It can be hosted on any eligible service-host node. If one replica fails, another replica can accept the service lease and traffic can be routed to it. ## Scale Model For 1000 devices, the platform needs entry pools, exit pools, route leases, session placement, and overload protection. For millions of devices, the platform additionally needs regional route coordinators, distributed peer directories, local control partitions, telemetry sampling, policy sharding, and resource accounting. Every device joining the system increases potential edge capacity, but only if the scheduler can safely decide when that node is allowed to relay, store, serve, or only consume. ## Security And Abuse Controls The distributed model increases power and also risk. The following controls are required before mobile relay/control/storage roles are broadly enabled: - node identity is cryptographic; IP address is never identity; - all route leases are signed or locally verifiable; - roles are scoped by organization, tenant, service, and time; - mobile relay is opt-in by policy and user/device state; - storage uses encrypted shards and explicit retention policy; - control-plane participation requires trust tier and quorum policy; - nodes never receive more topology or secret data than their role requires; - abuse controls rate-limit relay use, route churn, and failed authentication; - traffic accounting records who relayed what class and how much, without exposing payload contents. ## Observability The current tests show why aggregate "VPN works" is not enough. The fabric needs per-node, per-route, and per-stream metrics: - throughput by direction and traffic class; - RTT, jitter, loss, retransmits, queue depth; - frame encode/decode errors; - stream resets and close reasons; - route switch reason and time to recovery; - node pressure and scheduler decisions; - service discovery failover events; - Android foreground/background and network transition events. ## Work Plan ### Stage FNP-0: Architecture Lock Status: this document. Deliverables: - fix "every device is a node" as the model; - separate fabric, services, control, and data plane; - define missing protocol, route, scale, security, and observability pieces. ### Stage FNP-1: Binary Frame Contract Deliverables: - add a transport-neutral Go package for Fabric Data Session V1 frame types; - encode/decode binary frames with size limits and validation; - add tests for malformed frames, max frame size, stream ids, and frame type compatibility; - do not connect it to production traffic yet. ### Stage FNP-2: Persistent Session Runtime Skeleton Status: in progress in `agents/rap-node-agent/internal/fabricproto`. Deliverables: - implement in-memory session runtime with streams, sequence numbers, ACK, stream credit, reset, and close; - handle protocol frames for open/data/ack/credit/reset/close/ping/goaway; - prove that a blocked bulk stream does not block control/interactive streams; - expose per-stream metrics. ### Stage FNP-3: WebSocket/TCP Compatibility Transport Status: started with a transport-neutral `io.Reader`/`io.Writer` frame loop, WebSocket frame adapter in `agents/rap-node-agent/internal/fabricproto`, and a gated/authenticated mesh smoke endpoint/client at `/mesh/v1/fabric/session/ws`. `rap-host-agent fabric-session-smoke` provides the first operator smoke command and can pass signed fabric-session authority payload/signature headers for authority-pinned nodes. Node-agent exposes the endpoint only when `RAP_MESH_FABRIC_SESSION_ENABLED` / `-mesh-fabric-session-enabled` is set, and reports the enabled endpoint in heartbeat metadata. `mesh-live-smoke` includes a fabric-session `PING`/`PONG` check alongside the existing route and test-service probes. Mesh client code now has a reusable `FabricSessionClient` for multiple frame exchanges over one WebSocket session, plus a pump mode with outbound/inbound queues for asynchronous stream traffic. Live smoke verifies two `PING`/`PONG` round trips on the same connection. `vpnruntime` has a binary VPN packet-batch mapper for `FrameData` payloads so packet delivery can move away from JSON production envelopes in a gated mode. `FabricSessionPacketTransport` now adapts that mapper to the existing `PacketTransport` interface and can demultiplex inbound DATA frames into the VPN packet inbox by stream id. `mesh-live-smoke` now sends a real VPN packet batch through `FabricSessionPacketTransport` over the WebSocket fabric session and requires a stream ACK from the remote node. Mesh has a peer session manager that reuses one pump per peer endpoint, giving VPN transport selection a stable place to acquire long-lived fabric sessions. Node config now carries a separate gated `RAP_VPN_FABRIC_SESSION_TRANSPORT_ENABLED` switch and heartbeat report for the binary VPN packet transport, keeping endpoint exposure and VPN dataplane rollout independently controllable. When the VPN fabric-session switch is enabled, node-agent now attempts to use a long-lived peer session for gateway packet transport and falls back to the existing HTTP production envelope path when the peer session is unavailable. Peer session reuse now evicts closed pumps before reuse, so failed WebSocket sessions can be reopened on the next transport acquisition. Heartbeat telemetry includes peer session manager counters for active sessions, reuses, opens, closed-pump evictions, and explicit close operations. The mesh package now exposes a service-neutral `FabricTransport` abstraction; the current WebSocket carrier implements it as `WebSocketFabricTransport`, so future QUIC/UDP transport can be added without changing VPN/RDP/HTTP services. `QUICFabricTransport` now implements the same interface and carries the same binary `fabricproto` frames over a QUIC stream, with local smoke coverage for `PING`/`PONG` and DATA/ACK. Carrier selection understands QUIC transport labels and `quic://host:port` endpoints while preserving WebSocket as the default fallback. `QUICFabricServer` provides the matching node-side QUIC listener for accepting fabric streams and running the same session frame handler as other carriers. Node-agent can now gate the QUIC listener with `RAP_MESH_QUIC_FABRIC_ENABLED` / `RAP_MESH_QUIC_FABRIC_LISTEN_ADDR`, report it in heartbeat metadata, and pass the setting through host-agent install/update profiles. `mesh-live-smoke` verifies the QUIC carrier by starting a temporary QUIC fabric server and requiring a `PING`/`PONG` round trip over `QUICFabricTransport`. Nodes now advertise enabled QUIC fabric listeners as `direct_quic` fast-path endpoint candidates, and endpoint ranking prefers QUIC over WebSocket/HTTPS compatibility candidates for fabric sessions. VPN fabric-session gateway transport now consumes ranked endpoint candidates, so dataplane sessions can select QUIC fast-path candidates and fall back to legacy peer endpoints when the control plane has not published candidates yet. The temporary self-signed QUIC listener advertises its SHA-256 certificate fingerprint in endpoint metadata, and the QUIC client can pin that fingerprint instead of disabling verification while the cluster CA path is being finished. VPN fabric-session dialing now walks all ranked endpoint candidates before falling back to the legacy peer endpoint, so a failed QUIC candidate does not block WebSocket/HTTPS compatibility transport. Successful VPN fabric-session dialing logs the selected candidate, transport, certificate pin usage, and remaining fallback count for phone-side diagnostics. Heartbeat telemetry now includes VPN fabric-session dial counters for attempts, candidate failures, selected transport family, certificate pin usage, and the last selected endpoint/failure reason. VPN fabric-session dialing feeds candidate success/failure observations back into endpoint ranking, so repeated local QUIC failures can temporarily demote that endpoint while preserving it as a later fallback. Endpoint scoring no longer treats missing/zero latency on failed observations as moderate latency, preventing failed candidates from receiving a false score bonus. Endpoint health observations are now emitted as a bounded standalone heartbeat report (`rap.vpn_fabric_endpoint_health_report.v1`) so control plane can ingest candidate feedback without parsing the transport diagnostics blob. VPN fabric-session transport telemetry is carrier-neutral (`fabric_session_binary_frames`) and reports QUIC/WebSocket as available carriers instead of describing the dataplane as WebSocket-only. Endpoint health observations are pruned in-memory by age and count before snapshot/report generation, preventing long-running nodes from accumulating unbounded candidate history. Scoped and control-plane synthetic mesh config can now carry `peer_endpoint_observations`, and VPN fabric-session endpoint ranking merges those remote health hints with local observations using the newest signal. Endpoint health observations include source and reporter node fields so control plane can distinguish local dial feedback from aggregated or policy-generated health hints. The endpoint health heartbeat report also includes the reporter node id at the report level for simpler multi-node ingestion and diagnostics. Peer cache construction now applies endpoint health observations when ranking peer endpoint candidates, so recovery and warm-peer decisions see the same degraded-path feedback as VPN fabric-session dialing. Peer cache snapshots expose best-candidate score reasons, giving diagnostics a direct explanation for why a QUIC, WebSocket, relay, or fallback endpoint was chosen. Heartbeat capabilities now advertise that peer-cache endpoint ranking consumes health observations, allowing control plane and UI diagnostics to detect nodes running the health-aware peer selection path. VPN fabric QUIC transport now reuses QUIC connections per peer endpoint and opens logical fabric-session streams on top, with heartbeat telemetry for QUIC connection opens, reuses, evictions, and active count. Cached QUIC connections are pruned by idle TTL, preventing long-running agents from holding unused peer connections indefinitely. QUIC carrier connections now track active logical streams and enforce a per-connection stream limit, exposing stream opens/closes and limit rejects in transport telemetry. The per-connection QUIC stream limit is configurable through `RAP_VPN_FABRIC_QUIC_MAX_STREAMS_PER_CONN` / `-vpn-fabric-quic-max-streams-per-conn` and propagated by host-agent install profiles. QUIC stream-limit rejects are classified as capacity pressure instead of peer endpoint failure, so local health feedback does not incorrectly demote a healthy but saturated carrier. VPN fabric dial telemetry records the last capacity-limited endpoint and transport, making stream saturation visible without poisoning endpoint health observations. The same dial telemetry now keeps bounded per-endpoint capacity-pressure counters, so operators can see whether stream saturation is occasional or concentrated on a specific QUIC carrier. Fresh local capacity-pressure counters also feed endpoint ranking as a bounded penalty, spreading new fabric sessions away from a saturated carrier without declaring that carrier failed. VPN fabric-session transport now opens configurable per-class stream shards for interactive and bulk packet traffic, so heavy browser flows do not share a single logical stream with latency-sensitive RDP/control packets. Host-agent install commands for Docker, Linux, and Windows expose the same VPN fabric-session/QUIC tuning flags as install profiles, keeping manual and profile-based rollout paths aligned. Gateway runtime snapshots include the fabric-session packet transport stream layout and send counters by traffic class/stream id for load-test diagnosis. Those snapshots also summarize configured stream class/shard counts and active send class/stream counts, making sharding health visible without expanding per-stream maps. Gateway shutdown now closes all VPN fabric-session stream shards and then the underlying fabric session, preventing stale logical streams from consuming QUIC carrier capacity after reconnects or rollout restarts. Gateway runtime cancellation now fans out to both upload and download loops when either direction exits, so transport cleanup runs promptly on one-sided TUN or carrier failures. Fabric-session packet transport snapshots include close-frame and close-error counters for verifying that stream shard cleanup is actually happening. Outgoing VPN packet batches are split by traffic class and selected stream before they are framed, so one gateway batch containing many browser flows does not collapse onto the first packet's logical stream. `mesh-live-smoke` now sends mixed bulk and interactive VPN packets in a single fabric-session batch and requires them to remain sharded. The smoke report also exposes the mixed-batch frame fanout so regressions show up as a concrete fanout drop, not just a failed boolean. Batch fanout is bounded by configured stream shards, so a large batch with many flows cannot explode into unbounded fabric frames. Heartbeat tests assert the advertised VPN fabric stream-shard count and capability, keeping control-plane diagnostics aligned with runtime behavior. Fabric-session packet transport snapshots now report packets per stream plus last/max batch fanout, making real multi-site load distribution measurable from gateway status. Receive-side fabric-session packet counters are reported by traffic class and stream id as well, so gateway status can compare TX and RX distribution under browser/RDP load. QUIC fabric transport snapshots expose the configured stream limit, saturated connection count, and capacity pressure percentage next to stream limit rejects. Closed cached QUIC connections discovered during snapshot generation now update the transport's cumulative eviction counters, keeping successive heartbeats consistent. `mesh-live-smoke` reports QUIC fabric capacity-pressure percentage from the transport snapshot, verifying that the capacity fields are populated. QUIC fabric snapshots now include per cached connection pressure, endpoint, and saturation state; VPN fabric endpoint ranking consumes that live local pressure before stream-limit rejection, spreading new sessions away from already busy QUIC carriers. Per-connection QUIC snapshot entries are sorted by peer and endpoint so heartbeats and diagnostics stay stable across reports. When local live QUIC pressure and recent capacity-limit counters overlap, the ranking input keeps the stronger pressure signal rather than allowing a weak fresh sample to hide a saturated endpoint. Heartbeat VPN fabric reports now include a bounded `quic_capacity_pressure` summary sorted by busiest cached QUIC connection, making overload diagnosis visible without digging through the full carrier snapshot. VPN fabric flow-scheduler snapshots now expose bulk pressure activation plus bulk and interactive/control channel counts, making mixed browser/RDP load diagnosis explicit when bulk windows are reduced to protect interactive traffic. `mesh-live-smoke` now exercises that mixed-load scheduler path and reports bulk pressure activation plus bulk/interactive window recommendations. Flow-scheduler route recovery telemetry now records per-channel route switches, the failed route a channel recovered from, and aggregate recovered-channel / switch counts, making alternate-route recovery measurable during load tests. `mesh-live-smoke` now also exercises a primary-route failure followed by an alternate-route success and reports the resulting route switch count. The same smoke output reports measured route recovery milliseconds for the synthetic failover path. Smoke now includes max/average route recovery timing from the scheduler aggregate snapshot as well. Route recovery telemetry includes failure/switch timestamps and recovery duration in milliseconds for each recovered flow channel. Scheduler snapshots also aggregate route recovery max/average milliseconds across recovered channels for quick load-test health checks. Route recovery telemetry now includes normalized switch reasons and aggregate reason counts, so load tests can distinguish peer failures, timeouts, and other route-break causes. `mesh-live-smoke` reports the synthetic route-recovery reason beside recovery timing and switch count. Common route switch reasons are bucketed into stable labels such as timeout, peer_unavailable, connection_refused, connection_reset, no_route_to_host, and capacity_limited to keep heartbeat cardinality bounded. Flow-scheduler snapshots now include a machine-readable pressure level (`nominal`, `warning`, `critical`) and bounded reason list derived from drops, route failures, route recovery, slow channels, bulk pressure, and adaptive backpressure. The same pressure classification includes a bounded 0-100 score for automated route, endpoint, and node comparisons. `mesh-live-smoke` reports the mixed-load scheduler pressure level, score, and reasons. Heartbeat VPN fabric transport reports now include a compact `flow_pressure` summary with level, score, reasons, bulk pressure, route recovery timing, reason counts, and recommended per-class windows. Nodes advertise the `vpn_fabric_flow_pressure` capability when that heartbeat summary is available. When the VPN fabric ingress runtime has not been initialized yet, the heartbeat still emits a nominal `flow_pressure` summary for schema stability. Endpoint ranking treats `capacity_limited` observations as a soft pressure penalty instead of a hard recent failure, enabling load spreading without marking the carrier unhealthy. Local QUIC stream-limit pressure is now emitted as a capacity observation with no failure-count increment, allowing control plane to spread load without treating saturation as packet-path breakage. Cached QUIC carrier idle TTL is configurable through `RAP_VPN_FABRIC_QUIC_IDLE_TTL_SECONDS` / `-vpn-fabric-quic-idle-ttl` and propagated by host-agent install profiles. Deliverables: - carry binary frames over one persistent WebSocket/TCP connection; - replace high-frequency `/mesh/v1/forward` packet POST usage for VPN routes in a gated mode; - keep HTTP forwarding as fallback. ### Stage FNP-4: Android As Mobile Fabric Node Deliverables: - Android advertises node capabilities, network state, battery state, and supported transports; - Android opens Fabric Data Session V1 to entry; - VPN packets map to independent streams/classes; - diagnostics can run per-stream and per-route tests. ### Stage FNP-5: Route Leases And Multipath Deliverables: - route result includes primary and alternate routes; - runtime can switch new streams to a better route; - interactive streams can recover quickly after route fencing; - route health uses dataplane metrics, not only HTTP request success. ### Stage FNP-6: QUIC/UDP Transport Status: started with `QUICFabricTransport` in `internal/mesh`. Deliverables: - implement QUIC transport for Fabric Data Session V1; - preserve WebSocket/TCP as fallback; - test 4G/Wi-Fi transition and NAT behavior; - benchmark throughput, latency, and recovery against current HTTP forwarding. ### Stage FNP-7: Distributed Service Discovery Deliverables: - service names map to eligible service replicas; - admin console and VPN service can move between service-host nodes; - service failover is expressed as leases and route updates. ### Stage FNP-8: Mobile Relay And Distributed Capacity Deliverables: - mobile nodes can opt into relay under strict policy; - scheduler scores battery, metered network, NAT, trust, and load; - route planner can use mobile nodes where they are closer/faster; - accounting and abuse controls are enforced. ### Stage FNP-9: Scale To Large Fleets Deliverables: - entry and route coordinator pools; - peer directory sharding; - telemetry sampling and aggregation; - per-tenant quotas and fairness; - load tests for 1000 simulated devices, then larger synthetic fleets. ## Immediate Next Action Start Stage FNP-1 in `rap-node-agent` as a non-production protocol package. The goal is to create the binary frame contract and tests without disturbing the current VPN path. After that, wire it into a gated persistent session runtime and only then move Android/VPN traffic onto it.