379 lines
16 KiB
Markdown
379 lines
16 KiB
Markdown
# Distributed Fabric Node Protocol Plan
|
|
|
|
This document fixes the target direction for the Secure Access Fabric after the
|
|
VPN performance investigation. The platform must not be treated as a VPN
|
|
server, RDP gateway, or web console. It is a distributed overlay transport where
|
|
every participating device is a fabric node, and VPN/RDP/HTTP/admin/storage are
|
|
services running over that fabric.
|
|
|
|
## Core Position
|
|
|
|
Every device is a node.
|
|
|
|
A phone, home server, cloud server, relay, admin-console host, storage host, and
|
|
update-cache host share the same base identity model. They differ by roles,
|
|
capabilities, policy, trust level, and current health.
|
|
|
|
```text
|
|
Node = identity + roles + capabilities + policy + health + local state
|
|
```
|
|
|
|
The Android VPN app is therefore not only a client. It is a mobile fabric node.
|
|
It may carry VPN traffic, participate in route discovery, relay traffic when
|
|
policy allows, host limited control/storage roles when approved, and report
|
|
mobile-specific capacity signals such as battery, network type, NAT behavior,
|
|
foreground/background state, and metered network policy.
|
|
|
|
## What Was Missing
|
|
|
|
The current implementation proves route leases and production VPN forwarding,
|
|
but it still has a data-plane shape that cannot scale to high throughput:
|
|
|
|
- too much payload traffic is carried as small request/response HTTP forwarding
|
|
calls;
|
|
- JSON/base64 payload envelopes add overhead and CPU cost;
|
|
- one overloaded stream can delay unrelated traffic;
|
|
- route health is visible, but the transport does not yet provide enough
|
|
low-latency per-stream feedback;
|
|
- the phone behaves mostly as a service client, not as a full fabric node;
|
|
- service discovery and route execution are not yet separated cleanly enough;
|
|
- fallback paths can keep traffic alive, but can also hide architecture
|
|
bottlenecks if used as the primary data plane.
|
|
|
|
For 100 Mbps per active device and future 1000+ or millions of devices, the
|
|
fabric must move to a persistent, binary, multiplexed data plane with explicit
|
|
route and stream semantics.
|
|
|
|
## Non-Negotiable Principles
|
|
|
|
1. Fabric is the lower transport layer. VPN, RDP, HTTP, admin console, storage,
|
|
and update delivery are services above it.
|
|
2. Service adapters must not discover topology, own route selection, or invent
|
|
failover logic. They request transport from the fabric.
|
|
3. Control plane and data plane are separate. API/console traffic must not be
|
|
the packet transport mechanism.
|
|
4. Every data session carries many independent streams. A blocked bulk download
|
|
must not stall RDP, DNS, control, or telemetry.
|
|
5. Routes are leased and replaceable. Route selection uses quality, policy,
|
|
locality, role eligibility, cost, trust, and current load.
|
|
6. The fabric is distributed. Central control can coordinate, but the runtime
|
|
must keep working through cached policy, peer directories, route leases, and
|
|
local health when central components are degraded.
|
|
7. Mobile nodes are first-class nodes with stricter capability scoring.
|
|
8. HTTP forwarding remains a compatibility and emergency fallback, not the
|
|
primary high-speed data plane.
|
|
|
|
## Node Roles
|
|
|
|
Initial role vocabulary:
|
|
|
|
- `mobile-edge`: mobile Android/iOS fabric node.
|
|
- `entry`: accepts external sessions.
|
|
- `relay`: forwards fabric traffic between nodes.
|
|
- `exit`: terminates routes into a target network or service zone.
|
|
- `service-host`: runs service adapters such as admin console, VPN exit, RDP,
|
|
HTTP ingress, storage, or update-cache.
|
|
- `control-plane`: participates in control authority, policy decisions, route
|
|
authority, or quorum work.
|
|
- `route-coordinator`: calculates or assists route candidates for a partition,
|
|
region, or service class.
|
|
- `storage`: stores approved replicated fabric state.
|
|
- `observer`: collects telemetry and health without carrying user traffic.
|
|
- `update-cache`: mirrors signed artifacts close to nodes.
|
|
|
|
Roles are policy decisions, not binary builds. A phone can theoretically receive
|
|
any role, but scheduler scoring must account for battery, OS restrictions, NAT,
|
|
uplink stability, foreground state, and user cost policy.
|
|
|
|
## Capability Model
|
|
|
|
Nodes must advertise capability facts in heartbeats and peer updates:
|
|
|
|
- supported fabric protocol versions;
|
|
- supported transports: UDP/QUIC, TCP, WebSocket, HTTPS fallback;
|
|
- NAT type and reachability;
|
|
- measured RTT/loss/jitter/bandwidth to peers and entry candidates;
|
|
- CPU, memory, queue depth, file descriptor/socket pressure;
|
|
- battery state, charging state, mobile/wifi network type, metered policy;
|
|
- max relay bandwidth and allowed traffic classes;
|
|
- service roles and service capacity;
|
|
- trust tier and allowed tenant/organization scopes;
|
|
- local policy version, peer directory version, route cache version.
|
|
|
|
## Fabric Data Session V1
|
|
|
|
The first practical protocol step is a persistent binary data session. It may
|
|
initially run over WebSocket/TCP for faster delivery, but the framing must be
|
|
transport-neutral so the same protocol can move to QUIC/UDP.
|
|
|
|
Minimum frame set:
|
|
|
|
```text
|
|
HELLO node identity, protocol version, capabilities
|
|
AUTH signed session token or mTLS-bound proof
|
|
SESSION_READY accepted limits, route epoch, peer epoch
|
|
OPEN_STREAM stream id, service id, traffic class, route id
|
|
DATA stream id, sequence, flags, payload
|
|
ACK stream id, received sequence/window
|
|
PING/PONG RTT and liveness
|
|
ROUTE_UPDATE new route lease or alternate route set
|
|
STREAM_CREDIT per-stream backpressure window
|
|
NODE_PRESSURE queue/cpu/memory/network pressure signal
|
|
CLOSE_STREAM normal stream close
|
|
RESET_STREAM failed stream, other streams remain alive
|
|
GOAWAY draining or protocol shutdown
|
|
```
|
|
|
|
Traffic classes:
|
|
|
|
- `control`: authorization, route updates, attach/detach, liveness.
|
|
- `dns`: small, latency-sensitive name resolution.
|
|
- `interactive`: RDP input, SSH interactive, UI control.
|
|
- `reliable`: normal web/API traffic.
|
|
- `bulk`: downloads, uploads, sync, large media.
|
|
- `droppable`: telemetry samples, optional probes, low-value background data.
|
|
|
|
Each stream has independent flow control and backpressure. Bulk can be slowed or
|
|
moved to another route without blocking control or interactive streams.
|
|
|
|
## Route Model
|
|
|
|
The fabric must maintain multiple candidate routes for an active session:
|
|
|
|
```text
|
|
phone-a -> entry-1 -> home-1
|
|
phone-a -> phone-b -> relay-2 -> home-1
|
|
phone-a -> entry-2 -> relay-4 -> service-host-7
|
|
```
|
|
|
|
Route scoring inputs:
|
|
|
|
- policy and role eligibility;
|
|
- route length and failure domains;
|
|
- RTT, jitter, packet loss, bandwidth estimate;
|
|
- queue depth and retransmit pressure;
|
|
- current node CPU/memory/socket pressure;
|
|
- mobile battery/charging/metered status;
|
|
- historical reliability;
|
|
- service locality;
|
|
- tenant/organization isolation;
|
|
- cost and operator preference.
|
|
|
|
Routes are issued as short leases with route id, epoch, allowed channels,
|
|
allowed service classes, hop list or next-hop policy, expiry, and fencing rules.
|
|
|
|
## Service Discovery
|
|
|
|
Services are logical names, not fixed hosts:
|
|
|
|
```text
|
|
service: admin-console
|
|
replicas: home-1, node-2, node-9
|
|
policy: active-active or leader/follower
|
|
ingress: vpn.cin.su / admin.cin.su / internal name
|
|
```
|
|
|
|
`vpn.cin.su` as an HTTP/HTTPS entry is a service endpoint. It can be hosted on
|
|
any eligible service-host node. If one replica fails, another replica can accept
|
|
the service lease and traffic can be routed to it.
|
|
|
|
## Scale Model
|
|
|
|
For 1000 devices, the platform needs entry pools, exit pools, route leases,
|
|
session placement, and overload protection.
|
|
|
|
For millions of devices, the platform additionally needs regional route
|
|
coordinators, distributed peer directories, local control partitions, telemetry
|
|
sampling, policy sharding, and resource accounting.
|
|
|
|
Every device joining the system increases potential edge capacity, but only if
|
|
the scheduler can safely decide when that node is allowed to relay, store, serve,
|
|
or only consume.
|
|
|
|
## Security And Abuse Controls
|
|
|
|
The distributed model increases power and also risk. The following controls are
|
|
required before mobile relay/control/storage roles are broadly enabled:
|
|
|
|
- node identity is cryptographic; IP address is never identity;
|
|
- all route leases are signed or locally verifiable;
|
|
- roles are scoped by organization, tenant, service, and time;
|
|
- mobile relay is opt-in by policy and user/device state;
|
|
- storage uses encrypted shards and explicit retention policy;
|
|
- control-plane participation requires trust tier and quorum policy;
|
|
- nodes never receive more topology or secret data than their role requires;
|
|
- abuse controls rate-limit relay use, route churn, and failed authentication;
|
|
- traffic accounting records who relayed what class and how much, without
|
|
exposing payload contents.
|
|
|
|
## Observability
|
|
|
|
The current tests show why aggregate "VPN works" is not enough. The fabric needs
|
|
per-node, per-route, and per-stream metrics:
|
|
|
|
- throughput by direction and traffic class;
|
|
- RTT, jitter, loss, retransmits, queue depth;
|
|
- frame encode/decode errors;
|
|
- stream resets and close reasons;
|
|
- route switch reason and time to recovery;
|
|
- node pressure and scheduler decisions;
|
|
- service discovery failover events;
|
|
- Android foreground/background and network transition events.
|
|
|
|
## Work Plan
|
|
|
|
### Stage FNP-0: Architecture Lock
|
|
|
|
Status: this document.
|
|
|
|
Deliverables:
|
|
|
|
- fix "every device is a node" as the model;
|
|
- separate fabric, services, control, and data plane;
|
|
- define missing protocol, route, scale, security, and observability pieces.
|
|
|
|
### Stage FNP-1: Binary Frame Contract
|
|
|
|
Deliverables:
|
|
|
|
- add a transport-neutral Go package for Fabric Data Session V1 frame types;
|
|
- encode/decode binary frames with size limits and validation;
|
|
- add tests for malformed frames, max frame size, stream ids, and frame type
|
|
compatibility;
|
|
- do not connect it to production traffic yet.
|
|
|
|
### Stage FNP-2: Persistent Session Runtime Skeleton
|
|
|
|
Status: in progress in `agents/rap-node-agent/internal/fabricproto`.
|
|
|
|
Deliverables:
|
|
|
|
- implement in-memory session runtime with streams, sequence numbers, ACK,
|
|
stream credit, reset, and close;
|
|
- handle protocol frames for open/data/ack/credit/reset/close/ping/goaway;
|
|
- prove that a blocked bulk stream does not block control/interactive streams;
|
|
- expose per-stream metrics.
|
|
|
|
### Stage FNP-3: WebSocket/TCP Compatibility Transport
|
|
|
|
Status: started with a transport-neutral `io.Reader`/`io.Writer` frame loop,
|
|
WebSocket frame adapter in `agents/rap-node-agent/internal/fabricproto`, and a
|
|
gated/authenticated mesh smoke endpoint/client at `/mesh/v1/fabric/session/ws`.
|
|
`rap-host-agent fabric-session-smoke` provides the first operator smoke command
|
|
and can pass signed fabric-session authority payload/signature headers for
|
|
authority-pinned nodes.
|
|
Node-agent exposes the endpoint only when `RAP_MESH_FABRIC_SESSION_ENABLED` /
|
|
`-mesh-fabric-session-enabled` is set, and reports the enabled endpoint in
|
|
heartbeat metadata.
|
|
`mesh-live-smoke` includes a fabric-session `PING`/`PONG` check alongside the
|
|
existing route and test-service probes. Mesh client code now has a reusable
|
|
`FabricSessionClient` for multiple frame exchanges over one WebSocket session,
|
|
plus a pump mode with outbound/inbound queues for asynchronous stream traffic.
|
|
Live smoke verifies two `PING`/`PONG` round trips on the same connection.
|
|
`vpnruntime` has a binary VPN packet-batch mapper for `FrameData` payloads so
|
|
packet delivery can move away from JSON production envelopes in a gated mode.
|
|
`FabricSessionPacketTransport` now adapts that mapper to the existing
|
|
`PacketTransport` interface and can demultiplex inbound DATA frames into the
|
|
VPN packet inbox by stream id.
|
|
`mesh-live-smoke` now sends a real VPN packet batch through
|
|
`FabricSessionPacketTransport` over the WebSocket fabric session and requires a
|
|
stream ACK from the remote node.
|
|
Mesh has a peer session manager that reuses one pump per peer endpoint, giving
|
|
VPN transport selection a stable place to acquire long-lived fabric sessions.
|
|
Node config now carries a separate gated
|
|
`RAP_VPN_FABRIC_SESSION_TRANSPORT_ENABLED` switch and heartbeat report for the
|
|
binary VPN packet transport, keeping endpoint exposure and VPN dataplane
|
|
rollout independently controllable.
|
|
When the VPN fabric-session switch is enabled, node-agent now attempts to use a
|
|
long-lived peer session for gateway packet transport and falls back to the
|
|
existing HTTP production envelope path when the peer session is unavailable.
|
|
Peer session reuse now evicts closed pumps before reuse, so failed WebSocket
|
|
sessions can be reopened on the next transport acquisition.
|
|
Heartbeat telemetry includes peer session manager counters for active sessions,
|
|
reuses, opens, closed-pump evictions, and explicit close operations.
|
|
The mesh package now exposes a service-neutral `FabricTransport` abstraction;
|
|
the current WebSocket carrier implements it as `WebSocketFabricTransport`, so
|
|
future QUIC/UDP transport can be added without changing VPN/RDP/HTTP services.
|
|
`QUICFabricTransport` now implements the same interface and carries the same
|
|
binary `fabricproto` frames over a QUIC stream, with local smoke coverage for
|
|
`PING`/`PONG` and DATA/ACK.
|
|
Carrier selection understands QUIC transport labels and `quic://host:port`
|
|
endpoints while preserving WebSocket as the default fallback.
|
|
`QUICFabricServer` provides the matching node-side QUIC listener for accepting
|
|
fabric streams and running the same session frame handler as other carriers.
|
|
Node-agent can now gate the QUIC listener with
|
|
`RAP_MESH_QUIC_FABRIC_ENABLED` / `RAP_MESH_QUIC_FABRIC_LISTEN_ADDR`, report it
|
|
in heartbeat metadata, and pass the setting through host-agent install/update
|
|
profiles.
|
|
|
|
Deliverables:
|
|
|
|
- carry binary frames over one persistent WebSocket/TCP connection;
|
|
- replace high-frequency `/mesh/v1/forward` packet POST usage for VPN routes in
|
|
a gated mode;
|
|
- keep HTTP forwarding as fallback.
|
|
|
|
### Stage FNP-4: Android As Mobile Fabric Node
|
|
|
|
Deliverables:
|
|
|
|
- Android advertises node capabilities, network state, battery state, and
|
|
supported transports;
|
|
- Android opens Fabric Data Session V1 to entry;
|
|
- VPN packets map to independent streams/classes;
|
|
- diagnostics can run per-stream and per-route tests.
|
|
|
|
### Stage FNP-5: Route Leases And Multipath
|
|
|
|
Deliverables:
|
|
|
|
- route result includes primary and alternate routes;
|
|
- runtime can switch new streams to a better route;
|
|
- interactive streams can recover quickly after route fencing;
|
|
- route health uses dataplane metrics, not only HTTP request success.
|
|
|
|
### Stage FNP-6: QUIC/UDP Transport
|
|
|
|
Status: started with `QUICFabricTransport` in `internal/mesh`.
|
|
|
|
Deliverables:
|
|
|
|
- implement QUIC transport for Fabric Data Session V1;
|
|
- preserve WebSocket/TCP as fallback;
|
|
- test 4G/Wi-Fi transition and NAT behavior;
|
|
- benchmark throughput, latency, and recovery against current HTTP forwarding.
|
|
|
|
### Stage FNP-7: Distributed Service Discovery
|
|
|
|
Deliverables:
|
|
|
|
- service names map to eligible service replicas;
|
|
- admin console and VPN service can move between service-host nodes;
|
|
- service failover is expressed as leases and route updates.
|
|
|
|
### Stage FNP-8: Mobile Relay And Distributed Capacity
|
|
|
|
Deliverables:
|
|
|
|
- mobile nodes can opt into relay under strict policy;
|
|
- scheduler scores battery, metered network, NAT, trust, and load;
|
|
- route planner can use mobile nodes where they are closer/faster;
|
|
- accounting and abuse controls are enforced.
|
|
|
|
### Stage FNP-9: Scale To Large Fleets
|
|
|
|
Deliverables:
|
|
|
|
- entry and route coordinator pools;
|
|
- peer directory sharding;
|
|
- telemetry sampling and aggregation;
|
|
- per-tenant quotas and fairness;
|
|
- load tests for 1000 simulated devices, then larger synthetic fleets.
|
|
|
|
## Immediate Next Action
|
|
|
|
Start Stage FNP-1 in `rap-node-agent` as a non-production protocol package. The
|
|
goal is to create the binary frame contract and tests without disturbing the
|
|
current VPN path. After that, wire it into a gated persistent session runtime and
|
|
only then move Android/VPN traffic onto it.
|