рабочий вариант, но скороть 10 МБит
build / backend (push) Has been cancelled
build / node-agent (push) Has been cancelled
build / worker (push) Has been cancelled

This commit is contained in:
2026-05-22 21:46:49 +03:00
parent 469fa0e860
commit 20d361a886
280 changed files with 954890 additions and 18524 deletions
+65 -77
View File
@@ -34,10 +34,10 @@ Implemented:
- reliable fabric/control queue rejection when full
- bounded non-production `synthetic.echo` test-service path
- direct, single-relay, and forced-fallback test-service proofs
- live HTTP peer transport for synthetic mesh envelopes
- disabled-by-default synthetic mesh HTTP endpoint in `rap-node-agent`
- live QUIC peer transport for synthetic mesh envelopes
- disabled-by-default synthetic mesh QUIC endpoint in `rap-node-agent`
- `mesh-live-smoke` harness proving direct and single-relay synthetic traffic
over real local HTTP endpoints
over real local QUIC endpoints
- scoped synthetic mesh config file loading for peer endpoints and routes
- Control Plane synthetic mesh config read fallback when no local scoped config
file is set
@@ -46,7 +46,7 @@ Implemented:
- explicit production mesh forwarding gate config; production forwarding still
has no runtime implementation and remains unavailable
- route-bound production mesh envelope contract and fail-closed validation on
`/mesh/v1/forward`
the QUIC production-forward path
- metadata-only production envelope observation hook for valid envelopes, still
without forwarding payloads
- bounded metadata-only production envelope observation sink for accepted
@@ -93,7 +93,7 @@ Implemented:
- bounded peer recovery planner over peer cache and connection states
- peer connection intent planner with transport readiness classification
- peer connection manager for real control-plane health over reusable
HTTP keep-alive transport
QUIC fabric transport
- route-health effective-path runtime through replacement relay control paths
Not implemented yet:
@@ -125,35 +125,30 @@ state directory. On Linux it also installs a systemd `update-loop` service by
default, so nodes continue to update from Control Plane policy without operator
commands on each host.
Preferred profile-based install:
Preferred fabric-native install:
```bash
rap-host-agent install \
--profile-url https://control.example.com/api/v1 \
--cluster-id <cluster_id> \
--install-token <one_time_install_token> \
--node-name docker-node-1
--bootstrap-bundle ./docker-node-1.bootstrap.json
```
The host-agent exchanges the install token for a signed control-plane install
profile, then applies Docker image, container, state-dir, mesh listen,
advertise, NAT/connectivity, and region settings from that profile. The same
token is then used by the node-agent for first enrollment, so the operator does
not need to manually pass cluster/runtime flags.
Offline/import bootstrap is also supported:
```bash
rap-host-agent install \
--bootstrap-bundle ./docker-node-1.bootstrap.json
```
The bootstrap bundle carries the signed install profile, pinned cluster
authority key, and QUIC fabric registry seeds. The host-agent applies Docker
image, container, state-dir, mesh listen, advertise, NAT/connectivity, and
region settings locally, then the node-agent enrolls through QUIC fabric.
Manual install is still supported:
```bash
rap-host-agent install \
--backend-url http://192.168.200.61:18080/api/v1 \
--cluster-id <cluster_id> \
--join-token <raw_join_token> \
--node-name docker-node-1 \
--image rap-node-agent:dev-enrollment-bootstrap-smoke \
--container-name rap-node-agent-docker-node-1 \
--state-dir /var/lib/rap/nodes/docker-node-1 \
--network host \
--replace
--bootstrap-bundle ./docker-node-1.bootstrap.json
```
The command creates or replaces only the local Docker container. The running
@@ -175,8 +170,6 @@ local updater service without recreating the node-agent container:
```bash
rap-host-agent install-updater \
--backend-url http://192.168.200.61:18080/api/v1 \
--cluster-id <cluster_id> \
--state-dir /var/lib/rap/nodes/docker-node-1 \
--container-name rap-node-agent-docker-node-1
```
@@ -191,7 +184,6 @@ container is running, and reports update phases back to the Control Plane.
```bash
rap-host-agent update \
--backend-url http://192.168.200.61:18080/api/v1 \
--cluster-id <cluster_id> \
--node-id <node_id> \
--container-name rap-node-agent-docker-node-1 \
@@ -215,7 +207,6 @@ already-installed release.
```bash
rap-host-agent update-loop \
--backend-url http://192.168.200.61:18080/api/v1 \
--cluster-id <cluster_id> \
--node-id <node_id> \
--container-name rap-node-agent-docker-node-1 \
@@ -241,7 +232,6 @@ the new binary.
```bash
rap-host-agent update-host-agent-loop \
--backend-url http://192.168.200.61:18080/api/v1 \
--cluster-id <cluster_id> \
--state-dir /var/lib/rap/nodes/docker-node-1 \
--binary-path /usr/local/bin/rap-host-agent
@@ -249,16 +239,21 @@ rap-host-agent update-host-agent-loop \
## Windows Host Agent Bootstrap And Updates
Windows uses the same Control Plane install profile, but the local placement is
a Scheduled Task instead of Docker. In `--startup-mode auto` the installer first
Windows uses the same bootstrap bundle model, but the local placement is a
Scheduled Task instead of Docker. In `--startup-mode auto` the installer first
tries an elevated `ONSTART` task running as `SYSTEM`; without admin rights it
falls back to a per-user `ONLOGON` task. The `ONSTART` mode starts after reboot
without an interactive user session. The `ONLOGON` fallback can only start after
that Windows user signs in.
```cmd
powershell -NoProfile -ExecutionPolicy Bypass -Command "Invoke-WebRequest -UseBasicParsing 'http://control.example.com/downloads/rap-host-agent-windows-amd64.exe' -OutFile $env:TEMP\rap-host-agent.exe"
%TEMP%\rap-host-agent.exe install-windows --profile-url "http://control.example.com/api/v1" --cluster-id "<cluster_id>" --install-token "<one_time_install_token>" --node-name "office-win-1" --startup-mode "auto"
%TEMP%\rap-host-agent.exe install-windows --bootstrap-bundle "C:\bootstrap\office-win-1.bootstrap.json" --startup-mode "auto"
```
Offline/import bootstrap is also supported:
```cmd
%TEMP%\rap-host-agent.exe install-windows --bootstrap-bundle "C:\bootstrap\office-win-1.bootstrap.json" --startup-mode "auto"
```
`install-windows` installs two tasks:
@@ -275,9 +270,8 @@ independent from the local identity file location and is required for repair of
older Windows installs where the node is already heartbeat-healthy but the
host-agent updater has no usable identity file.
```cmd
%TEMP%\rap-host-agent.exe install-windows --backend-url "http://control.example.com/api/v1" --cluster-id "<cluster_id>" --node-id "<node_id>" --node-name "office-win-1" --replace --startup-mode "auto" --auto-update-current-version "<current_version>"
```
The repair path also reuses the local signed bootstrap/runtime state; it does
not require any backend URL.
The admin UI node details page generates a downloadable
`rap-repair-updater-<node>.cmd` for this repair path. It performs these steps:
@@ -347,14 +341,8 @@ Control Plane release artifacts for Windows must use:
Create a join token from the platform control plane, then run:
```powershell
.\bin\rap-node-agent.exe `
-backend-url http://192.168.200.61:8080/api/v1 `
-cluster-id <cluster_id> `
-join-token <raw_join_token> `
-node-name test-node-1 `
-state-dir C:\ProgramData\RapNodeAgent
```
Use a signed bootstrap bundle plus QUIC fabric registry seeds. The node
enrolls only through QUIC fabric inside the farm.
The agent submits a pending join request and exits. It does not self-activate.
A platform admin must approve the join request.
@@ -375,19 +363,18 @@ Then run the agent again:
```powershell
.\bin\rap-node-agent.exe `
-backend-url http://192.168.200.61:8080/api/v1 `
-state-dir C:\ProgramData\RapNodeAgent
```
It sends periodic heartbeats to:
It sends periodic heartbeats through the signed `control-api` service over QUIC
fabric:
```text
/api/v1/clusters/{clusterID}/nodes/{nodeID}/heartbeats
fabric control path /clusters/{clusterID}/nodes/{nodeID}/heartbeats
```
## Environment Variables
- `RAP_BACKEND_URL`
- `RAP_CLUSTER_ID`
- `RAP_CLUSTER_AUTHORITY_PUBLIC_KEY`
- `RAP_CLUSTER_AUTHORITY_FINGERPRINT`
@@ -398,8 +385,8 @@ It sends periodic heartbeats to:
- `RAP_HEARTBEAT_INTERVAL_SECONDS`
- `RAP_ENROLLMENT_POLL_INTERVAL_SECONDS`
- `RAP_ENROLLMENT_POLL_TIMEOUT_SECONDS`
- `RAP_MESH_SYNTHETIC_RUNTIME_ENABLED`
- `RAP_MESH_LISTEN_ADDR`
- `RAP_FABRIC_RUNTIME_ENABLED`
- `RAP_FABRIC_LISTEN_ADDR`
- `RAP_MESH_ADVERTISE_ENDPOINT`
- `RAP_MESH_ADVERTISE_ENDPOINTS_JSON`
- `RAP_MESH_ADVERTISE_TRANSPORT`
@@ -412,15 +399,15 @@ It sends periodic heartbeats to:
- `RAP_MESH_PRODUCTION_FORWARDING_ENABLED`
- `RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY`
`RAP_MESH_SYNTHETIC_RUNTIME_ENABLED` defaults to `false`. It gates only the
`RAP_FABRIC_RUNTIME_ENABLED` defaults to `false`. It gates only the
C17A/C17B/C17C/C17D/C17E synthetic probe, route-health, relay scheduling,
bounded `synthetic.echo` test-service runtime, and live synthetic HTTP endpoint.
bounded `synthetic.echo` test-service runtime, and live synthetic QUIC endpoint.
It must not be used for RDP, VPN, file, video, or other production service
traffic.
`RAP_WORKLOAD_SUPERVISION_ENABLED` defaults to `false`. When enabled, the agent
polls node-scoped desired workloads and reports status. The current bounded
runtime reports built-in `core-mesh` and `mesh-listener` services as running
runtime reports built-in `core-mesh` and `fabric-listener` services as running
when enabled, supports the native built-in `synthetic.echo` test workload, and
keeps unsupported production workloads such as RDP workers degraded until their
supervisors are implemented.
@@ -431,8 +418,9 @@ reports the remote-workspace adapter channel contract and requires Fabric
Service Channel as the future data plane; it does not start FreeRDP, create a
remote session, or carry production RDP payloads.
`RAP_MESH_LISTEN_ADDR` starts the C17E/C17F/C17G synthetic HTTP endpoint only when
`RAP_MESH_SYNTHETIC_RUNTIME_ENABLED=true`. `RAP_MESH_SYNTHETIC_CONFIG` points to
`RAP_FABRIC_LISTEN_ADDR` names the historical synthetic listener address, but the
current runtime is QUIC-fabric-only and does not start an HTTP listener.
`RAP_MESH_SYNTHETIC_CONFIG` points to
a scoped synthetic mesh config snapshot and is preferred over debug JSON.
`RAP_MESH_PEER_ENDPOINTS_JSON` is a JSON object mapping peer node IDs to
endpoint URLs. `RAP_MESH_SYNTHETIC_ROUTES_JSON` is a JSON array of synthetic
@@ -454,10 +442,9 @@ same fields in `identity.json` are set.
`RAP_MESH_PRODUCTION_FORWARDING_ENABLED` defaults to `false`. It is a future
production-forwarding gate only. Turning it on does not enable production mesh
payload forwarding; `/mesh/v1/forward` still returns an unavailable runtime
response after validating the route-bound production envelope contract, until
a later approved production mesh stage implements route-bound, policy-bound
forwarding.
payload forwarding; the runtime still refuses service traffic after validating
the route-bound production envelope contract, until a later approved
production mesh stage implements route-bound, policy-bound forwarding.
The production envelope contract requires route, hop, TTL, expiry, payload
length, and SHA-256 payload hash fields. C17J accepts only the
@@ -522,11 +509,11 @@ recent failure reason, reliability score, and freshness/staleness signals.
The score remains advisory only and is not wired into production forwarding.
C17Z adds the first narrow production forwarding runtime. When
`RAP_MESH_PRODUCTION_FORWARDING_ENABLED=true`, `/mesh/v1/forward` can deliver
route-bound `fabric.control` envelopes at the local destination or forward them
to a direct next hop from explicit peer endpoint config. Service channels,
RDP/VPN/file/video payloads, arbitrary relay forwarding, and multi-hop
production route execution remain unavailable.
`RAP_MESH_PRODUCTION_FORWARDING_ENABLED=true`, the QUIC production-forward
handler can deliver route-bound `fabric.control` envelopes at the local
destination or forward them to a direct next hop from explicit peer endpoint
config. Service channels, RDP/VPN/file/video payloads, arbitrary relay
forwarding, and multi-hop production route execution remain unavailable.
C17Z1 adds route-path-bound multi-hop forwarding for production
`fabric.control` only. Envelopes may carry `route_path` and
@@ -559,7 +546,7 @@ C17Z5 turns scoped peer directory and recovery seed config into node-local
runtime `PeerCache` state. The cache builds a bounded warm peer set from
route-adjacent peers, recovery seeds, peer endpoints, and endpoint candidates.
When synthetic mesh testing is enabled, the node-agent probes warm peers with
`/mesh/v1/health` and reports metadata-only mesh-link observations. This is not
QUIC fabric live probes and reports metadata-only mesh-link observations. This is not
a persistent connection manager and does not forward service payloads.
C17Z6 adds advertised mesh endpoint reporting. When
@@ -602,7 +589,7 @@ persistent connection transport, STUN/TURN/ICE, NAT traversal, relay runtime,
or service payload forwarding.
C17Z11 adds the first real node-local peer connection manager for mesh
control-plane health. It uses a reusable HTTP keep-alive client to probe
control-plane health. It uses a reusable QUIC fabric transport to probe
direct/private/corporate peer endpoints selected by C17Z10 intents, updates
the shared peer connection tracker, and records `waiting_rendezvous` for
outbound-only or relay-required peers. Heartbeats include metadata-only
@@ -612,8 +599,8 @@ payload forwarding.
C17Z12 adds a node-scoped rendezvous/relay control-plane lease contract for
peers that would otherwise remain `waiting_rendezvous`. The agent consumes
`rendezvous_leases`, resolves matching intents into `relay_control`, probes the
relay node `/mesh/v1/health`, and records `relay_ready` for the peer control
`rendezvous_leases`, resolves matching intents into `relay_quic`, probes the
relay node over QUIC fabric live probe, and records `relay_ready` for the peer control
path. This remains control-plane health only and does not enable RDP/VPN/file/
video/service payload forwarding, arbitrary relay packet forwarding,
STUN/TURN/ICE, or host networking changes.
@@ -668,17 +655,17 @@ enable service payload forwarding.
C17Z21 defines the portable inbound listener contract for Docker, Linux
service, Windows service, and future OS-specific node packages. The node-agent
does not stop when the mesh listen port cannot be bound. It keeps the outbound
Control Plane session alive and emits `c17z21.mesh_listener_report.v1` in
Control Plane session alive and emits `c17z21.fabric_listener_report.v1` in
heartbeat metadata with configured address, effective address, listen mode,
listener status, inbound reachability, one-way connectivity, failure reason,
and port-conflict diagnostics.
`RAP_MESH_LISTEN_PORT_MODE` controls behavior:
`RAP_FABRIC_LISTEN_PORT_MODE` controls behavior:
- `manual`: bind exactly `RAP_MESH_LISTEN_ADDR`; on conflict report
- `manual`: bind exactly `RAP_FABRIC_LISTEN_ADDR`; on conflict report
`listen_failed` and wait for an operator/config change.
- `auto`: try `RAP_MESH_LISTEN_ADDR`; on conflict scan
`RAP_MESH_LISTEN_AUTO_PORT_START..RAP_MESH_LISTEN_AUTO_PORT_END` and report
- `auto`: try `RAP_FABRIC_LISTEN_ADDR`; on conflict scan
`RAP_FABRIC_LISTEN_AUTO_PORT_START..RAP_FABRIC_LISTEN_AUTO_PORT_END` and report
`auto_rebound` when a free port is selected.
- `disabled`: do not open an inbound listener; the node is expected to be
outbound-only, relay/rendezvous, or Control Plane only.
@@ -694,7 +681,7 @@ C17Z22 separates outbound Control Plane presence from inbound mesh
reachability. When synthetic mesh testing is enabled, every heartbeat includes
`c17z22.mesh_outbound_session_report.v1` with node-to-control-plane direction,
keepalive transport, listener conflict state, rendezvous/relay counters, and a
flag showing whether the current outbound session can be used as a reverse
`fabric_control_endpoint` plus a flag showing whether the current outbound session can be used as a reverse
control-channel contract. This is the portable basis for Docker, Linux service,
Windows service, and future packages where a node may be behind NAT or have no
stable inbound address. It is still control-plane telemetry only and does not
@@ -715,7 +702,7 @@ and is ranked ahead of auto-discovered addresses.
C17Z25 adds per-peer endpoint fallback probing to the control-plane mesh
manager. A node no longer treats the top-ranked endpoint candidate as the only
possible address for a peer. For each warm direct/private/corporate peer, the
manager probes the ranked candidate list until one `/mesh/v1/health` endpoint
manager probes the ranked candidate list until one QUIC fabric endpoint
responds or all direct candidates fail. Heartbeat metadata includes
`c17z25.mesh_peer_connection_manager_report.v1` with `probe_results`,
`selected_candidate_id`, `selected_endpoint`, and per-candidate success/failure
@@ -733,14 +720,14 @@ Scoped synthetic config shape:
"peer_directory_version": "peers-v1",
"policy_version": "policy-v1",
"peer_endpoints": {
"node-b": "http://127.0.0.1:19002"
"node-b": "quic://127.0.0.1:19002"
},
"peer_endpoint_candidates": {
"node-b": [
{
"endpoint_id": "node-b-public",
"node_id": "node-b",
"transport": "direct_tcp_tls",
"transport": "direct_quic",
"address": "203.0.113.20:443",
"reachability": "public",
"nat_type": "restricted",
@@ -784,3 +771,4 @@ Expected:
- Production forwarding remains disabled by default and limited to
`fabric.control` when explicitly enabled.
- No privileged operations are performed by the current agent.