рабочий вариант, но скороть 10 МБит
build / backend (push) Has been cancelled
build / node-agent (push) Has been cancelled
build / worker (push) Has been cancelled

This commit is contained in:
2026-05-22 21:46:49 +03:00
parent 469fa0e860
commit 20d361a886
280 changed files with 954890 additions and 18524 deletions
+76 -76
View File
@@ -95,11 +95,11 @@ Current audit and baseline snapshot:
- Canonical test Docker host: `192.168.200.61`
- Canonical Docker context: `test-ubuntu`
- Canonical SSH alias: `docker-test`
- Current external control-plane endpoint for remote/offsite node enrollment:
- Current external fabric control endpoint for remote/offsite node enrollment:
`http://94.141.118.222:19191` / `http://vpn.cin.su:19191`.
- Current port forward: `94.141.118.222:19191` -> `192.168.200.61:18080`.
- For offsite Windows/Linux nodes, install profiles should use:
`http://vpn.cin.su:19191/api/v1` as control-plane endpoint and
`http://vpn.cin.su:19191/api/v1` as fabric control endpoint and
`http://vpn.cin.su:19191/downloads` as artifact endpoint unless the user
explicitly chooses the raw IP endpoint.
- Backend API for local/client smoke runs: `http://192.168.200.61:8080/api/v1`
@@ -699,7 +699,7 @@ Current implementation focus remains:
replace their local wrapper, after which automatic polling should continue.
- Admin UI now marks missing host-agent updater reports as `repair updater` in
the node list and explains in node details / Updates when to run the Windows
repair command. The command uses the external control-plane endpoint and does
repair command. The command uses the external fabric control endpoint and does
not require a join token for already enrolled Windows nodes.
- Admin UI node details / Updates also provides a ready downloadable
`rap-repair-updater-<node>.cmd` plus copy-command action for Windows repair,
@@ -716,29 +716,28 @@ Current implementation focus remains:
Docker-test nodes `test-1/2/3` updated to `0.1.6`; existing Windows and
off-host Docker nodes still need their local updater wrappers to pick up the
0.1.6 host-agent repair path.
- C17Z30 operator-configured public mesh endpoints are implemented and
docker-test-deployed: desired `mesh-listener.advertise_endpoint` is now
projected into peer endpoint candidates for other nodes and preferred over
auto-discovered private heartbeat endpoints. `home-1`
(`8ad04829-cd30-4290-913d-1ce5c7ef7bb3`) is configured with
`listen_addr=0.0.0.0:19131`, `advertise_endpoint=http://94.141.118.222:19199`,
`connectivity_mode=direct`, `nat_type=port_restricted`, `region=home`.
`test-1` synthetic config now receives `home-1` peer endpoint
`http://94.141.118.222:19199`; internal `192.168.200.85:19131` responds with
HTTP 405 on GET, while external `94.141.118.222:19199` currently refuses TCP,
so router/firewall forwarding still needs correction outside the platform.
- C17Z31 offsite bootstrap peer selection is implemented and docker-test
deployed: operator-configured public/direct desired mesh-listener endpoints
are kept in core-mesh bootstrap even after the default warm-peer target is
reached. This fixes the case where remote Windows node
`ifcm-rufms-s-mo1cr` received only `test-*` warm peers and no `home-1`.
Its synthetic config now includes `home-1` endpoint
`http://94.141.118.222:19199` and candidates ordered as operator public,
heartbeat advertised public, then private LAN converted to relay-required for
offsite. External TCP to `94.141.118.222:19199` still failed from Codex and
docker-test checks while internal `192.168.200.85:19131` succeeds, so a real
offsite `Test-NetConnection 94.141.118.222 -Port 19199` is the next network
validation.
- C17Z30 operator-configured public mesh endpoints are implemented and
docker-test-deployed: desired `fabric-listener.advertise_endpoint` is now
projected into peer endpoint candidates for other nodes and preferred over
auto-discovered private heartbeat endpoints. `home-1`
(`8ad04829-cd30-4290-913d-1ce5c7ef7bb3`) is configured with
`listen_addr=0.0.0.0:19131`, `advertise_endpoint=quic://94.141.118.222:19199`,
`connectivity_mode=direct`, `nat_type=port_restricted`, `region=home`.
`test-1` synthetic config now receives `home-1` peer endpoint
`quic://94.141.118.222:19199`; internal `192.168.200.85:19131` responds on
the fabric listener while external `94.141.118.222:19199` still needs UDP
forwarding, so router/firewall correction remains outside the platform.
- C17Z31 offsite bootstrap peer selection is implemented and docker-test
deployed: operator-configured public/direct desired fabric-listener endpoints
are kept in core-mesh bootstrap even after the default warm-peer target is
reached. This fixes the case where remote Windows node
`ifcm-rufms-s-mo1cr` received only `test-*` warm peers and no `home-1`.
Its synthetic config now includes `home-1` endpoint
`quic://94.141.118.222:19199` and candidates ordered as operator public,
heartbeat advertised public, then private LAN converted to relay-required for
offsite. External UDP reachability to `94.141.118.222:19199` still needs
verification while internal `192.168.200.85:19131` succeeds, so the next
network validation is an offsite QUIC/UDP probe against port `19199`.
- C17Z32 native Ubuntu/Linux service install is implemented and docker-test
deployed: backend exposes `/node-agents/linux-install-profile`, host-agent
supports `install-linux`, installs `rap-node-agent` under
@@ -751,7 +750,7 @@ Current implementation focus remains:
install profile and generates profile-based `install-linux` commands.
A one-use token for `vps-ubuntu-1` is active until 2026-05-02T08:41:41Z:
`rap_join_a23Xhz63YstshWUBAPGPz5fzQ8YpHDP05RXaaYa4DoA`; scope roles are
`core-mesh` and `relay-node`, control-plane endpoint is
`core-mesh` and `relay-node`, fabric control endpoint is
`http://vpn.cin.su:19191/api/v1`, artifact endpoint is
`http://vpn.cin.su:19191/downloads`.
- Admin UI and docs now cover the full Windows updater operational workflow:
@@ -813,19 +812,19 @@ Current implementation focus remains:
`usa-los-1` (`linux_binary`) and `ifcm-rufms-s-mo1cr` (`windows_service`) now
return `action=update`, `target_version=0.2.40` instead of
`no_matching_artifact`.
- C18F production-forwarding gate work is partially live: backend
`rap-backend:test-vpn-fabric-route-0.2.42` signs node synthetic configs with
`production_forwarding=true` / `control_plane_only=false` when the node's
desired `mesh-listener` workload has `production_forwarding_enabled=true`.
`home-1` and `usa-los-1` desired mesh-listener configs have this flag enabled.
- C18F production-forwarding gate work is partially live: backend
`rap-backend:test-vpn-fabric-route-0.2.42` signs node synthetic configs with
`production_forwarding=true` / `control_plane_only=false` when the node's
desired `fabric-listener` workload has `production_forwarding_enabled=true`.
`home-1` and `usa-los-1` desired fabric-listener configs have this flag enabled.
Node-agent `0.2.44` accepts signed production-forwarding mesh configs and
host-agent `0.2.44` fixes Docker updater behavior so synthetic mesh runtime is
not disabled on Docker updates. Runtime status: `usa-los-1` reports
`mesh_production_forwarding=true`; `home-1` reports `0.2.44` and synthetic
runtime enabled, but its listener report is still `disabled/listen_addr_empty`,
so `home-1` is not yet a usable production fabric endpoint. Next action is to
repair why `home-1` is not applying the signed mesh-listener config
(`listen_addr=0.0.0.0:19131`) after Docker updater restart.
runtime enabled, but its listener report is still `disabled/listen_addr_empty`,
so `home-1` is not yet a usable production fabric endpoint. Next action is to
repair why `home-1` is not applying the signed fabric-listener config
(`listen_addr=0.0.0.0:19131`) after Docker updater restart.
- C18G VPN-over-fabric runtime path is live-tested on docker-test. Backend is
deployed as `rap-backend:test-vpn-fabric-route-0.2.43`; VPN route intents now
allow both `vpn_packet` data and `fabric_control` health probes. Node-agent
@@ -923,7 +922,7 @@ Current implementation focus remains:
`route_rebuild_recommended`, `degraded_fallback_recommended`, or repeated
consecutive failures. Fenced routes are not selected as primary or alternate;
if all selected entry/exit routes are fenced, the lease uses explicit
degraded backend fallback with reason
degraded compat fallback with reason
`fabric_routes_fenced_by_service_channel_feedback`. Live smoke created two
short-lived `test-1 -> test-2` route intents, injected a fresh
service-channel flow feedback heartbeat marking the higher-priority route as
@@ -1122,7 +1121,7 @@ Current implementation focus remains:
smoke-passed on 2026-05-07. Node-agent/host-agent `0.2.182` artifacts,
Docker image `rap-node-agent:0.2.182`, release manifests, and update
policies are published. Backend `rap-backend:fabric-service-channel-0.2.182`
is deployed on docker-test. The runtime fix is a dynamic mesh listener
is deployed on docker-test. The runtime fix is a dynamic fabric listener
handler: synthetic config refreshes now update `/mesh/v1/forward`,
service-channel ingress, production routes, delivery inbox, and forward
transport without requiring a port/listener restart. Backend route-feedback
@@ -1157,7 +1156,7 @@ Current implementation focus remains:
`rolling` target `0.2.183`, and the test containers run that image. The
runtime fix makes the entry node honor the signed service-channel lease
authority: leases with `status=degraded_fallback` or
`primary_route.status=missing_route_intent` now force backend fallback instead
`primary_route.status=missing_route_intent` now force compat fallback instead
of reusing stale generic route candidates. The same fallback rule is applied
to HTTP and WebSocket packet ingress. Script
`scripts/fabric/c18z3-live-service-channel-entry-ws-fallback-smoke.ps1`
@@ -1167,7 +1166,7 @@ Current implementation focus remains:
expiry. Result:
`artifacts/c18z3-live-service-channel-entry-ws-fallback-smoke-result.json`
run `c18z3-20260507-211402`: warm `4/4`, WebSocket packets `8`, recovery
`4/4`, backend fallback queue `0 -> 8`, route failures `0`, and all checks
`4/4`, compat fallback queue `0 -> 8`, route failures `0`, and all checks
passed. During publication the first `0.2.183` Docker tar had a malformed
entrypoint and stale size/hash metadata; it was rebuilt, the latest tar alias
was replaced, and the release artifact row was corrected to sha256
@@ -1182,7 +1181,7 @@ Current implementation focus remains:
refresh, and verifies the remaining packets use the alternate route. Result:
`artifacts/c18z4-live-service-channel-session-pressure-smoke-result.json`
run `c18z4-20260507-212748`: exit inbox depth `0 -> 384`, route failure delta
`0`, flow drop delta `0`, backend fallback queue `0 -> 0`, primary route
`0`, flow drop delta `0`, compat fallback queue `0 -> 0`, primary route
removed from entry/exit configs, alternate route selected after the switch,
and both route intents expired. This proves the shared Fabric Service Channel
can keep a service session alive while Control Plane changes the live route
@@ -1196,7 +1195,7 @@ Current implementation focus remains:
traffic over the same WebSocket. Result:
`artifacts/c18z5-live-service-channel-exit-restart-smoke-result.json` run
`c18z5-20260507-213745`: pre/outage/recovery batches `12/24/24`, total
packets `480`, route failure delta `48`, backend fallback queue `0 -> 192`,
packets `480`, route failure delta `48`, compat fallback queue `0 -> 192`,
flow drop delta `0`, and recovery exit inbox `0 -> 192`. This proves real
exit-node failure is visible as fallback/failure telemetry while the
long-lived service channel remains usable and fabric delivery resumes after
@@ -1215,7 +1214,7 @@ Current implementation focus remains:
`c18z6-20260507-214900`: pre/post batches `16/32`, total packets `384`,
exit inbox depth `0 -> 384`, Control Plane replacement route
`b2f3c510-46d2-4dce-8389-3952a99d0311`, route failure delta `0`, flow drop
delta `0`, backend fallback queue `0 -> 0`, all checks passed, and all
delta `0`, compat fallback queue `0 -> 0`, all checks passed, and all
active nodes remained healthy/current on `0.2.183`. This proves a live
service channel can apply a route-manager rebuild decision without rebuilding
the service WebSocket.
@@ -1230,7 +1229,7 @@ Current implementation focus remains:
`artifacts/c18z7-live-service-channel-concurrent-isolation-smoke-result.json`
run `c18z7-20260507-215727`: 3 sessions, 36 rounds, 288 packets per session,
864 packets total, each session exit inbox depth `288`, total exit depth
`864`, backend fallback delta `0`, route failure delta `0`, flow drop delta
`864`, compat fallback delta `0`, route failure delta `0`, flow drop delta
`0`, and all active nodes healthy/current on `0.2.183`. This proves rebuild
and route-manager state are shared correctly without one active service
session starving or poisoning the other concurrent sessions.
@@ -1246,7 +1245,7 @@ Current implementation focus remains:
run `c18z8-20260507-221347`: both interactive sessions delivered 192 packets
each, the abusive flow reached scheduler high watermark `1024`, scheduled
`1030` packets on the hottest channel, dropped `282` packets on that channel,
produced backend fallback delta `0`, route failure delta `0`, and all active
produced compat fallback delta `0`, route failure delta `0`, and all active
nodes stayed healthy/current on `0.2.183`. This proves bounded backpressure is
visible and isolated to the overloaded logical flow without starving other
active service sessions.
@@ -1267,7 +1266,7 @@ Current implementation focus remains:
direct replacement. Result:
`artifacts/c18z9-live-service-channel-route-pool-smoke-result.json` run
`c18z9-20260507-224901`: 54 batches / 432 packets sent and delivered to exit,
backend fallback delta `0`, route failure delta `0`, flow drop delta `0`, and
compat fallback delta `0`, route failure delta `0`, flow drop delta `0`, and
temporary route intents expired. Test containers `test-1/2/3` run
`rap-node-agent:0.2.184`; `usa-los-1`, `home-1`, and
`ifcm-rufms-s-mo1cr` remain healthy on `0.2.183` until their rollout policy is
@@ -1292,7 +1291,7 @@ Current implementation focus remains:
post-rebuild traffic reaches the alternate exit. Result:
`artifacts/c18z10-live-service-channel-exit-pool-smoke-result.json` run
`c18z10-20260507-232645`: 54 batches / 432 packets sent, primary exit queue
`144`, alternate exit queue `288`, backend fallback `0`, route failure delta
`144`, alternate exit queue `288`, compat fallback `0`, route failure delta
`0`, flow drop delta `0`, decision source
`service_channel_feedback_exit_pool_replacement`, and temporary route intents
expired. Backend and `test-1/2/3` are running `0.2.185`; update plans now
@@ -1317,7 +1316,7 @@ Current implementation focus remains:
verifies a refreshed lease selects `test-3`, then sends 288 more packets
through the alternate entry to the same exit. Result:
`artifacts/c18z11-live-service-channel-entry-pool-smoke-result.json` run
`c18z11-20260507-235341`: exit queue `432`, backend fallback `0`, route
`c18z11-20260507-235341`: exit queue `432`, compat fallback `0`, route
failure deltas `0/0`, flow drop deltas `0/0`, and temporary route intents
expired. This is a lease refresh/reconnect contract for entry replacement;
preserving a broken client-to-entry socket across an entry node outage is not
@@ -1355,7 +1354,7 @@ Current implementation focus remains:
`score_adjustment=90`), and a refreshed lease prefers that fast route over a
newly introduced higher-priority relay candidate. Result:
`artifacts/c18z13-live-service-channel-route-quality-smoke-result.json` run
`c18z13-20260508-001610`; backend fallback `0`, flow drops `0`, temporary
`c18z13-20260508-001610`; compat fallback `0`, flow drops `0`, temporary
route intents expired. Published release id:
`64effc62-18b6-4eeb-a1c9-f5fb8e251491`.
- C18Z14 active-session route-quality preference is implemented. Backend
@@ -1431,7 +1430,7 @@ Current implementation focus remains:
`artifacts/c18z17-live-service-channel-quality-cleanup-smoke-result.json`
run `c18z14-20260508-075750`; 60 batches / 480 packets delivered, active
quality markers `32`, stale quality markers `0`, visible preferences `3`,
backend fallback `0`, flow drops `0`, temporary route intents expired.
compat fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z18 service-session-scoped flow scheduler memory is implemented.
Node-agent `0.2.193` is built, published to docker-test downloads,
registered in the stable update channel, and deployed to `test-1/2/3`;
@@ -1448,12 +1447,12 @@ Current implementation focus remains:
`scripts/fabric/c18z18-service-channel-session-scoped-fairness-smoke.ps1`
wraps the live C18Z17 quality path and verifies served live channels are
session-scoped, unscoped served `flow-NN` channels are absent, quality
markers are session-scoped, backend fallback is `0`, and flow drops are `0`.
markers are session-scoped, compat fallback is `0`, and flow drops are `0`.
Result:
`artifacts/c18z18-service-channel-session-scoped-fairness-smoke-result.json`
run `c18z14-20260508-082520`; 60 batches / 480 packets delivered, served
channels `32`, session-scoped served channels `32`, session-scoped quality
channels `32`, unscoped served channels `0`, backend fallback `0`, flow drops
channels `32`, unscoped served channels `0`, compat fallback `0`, flow drops
`0`, temporary route intents expired.
- C18Z19 bounded parallel logical-flow send window is implemented. Node-agent
`0.2.194` is built, published to docker-test downloads, registered in the
@@ -1469,12 +1468,12 @@ Current implementation focus remains:
Live script
`scripts/fabric/c18z19-service-channel-parallel-flow-window-smoke.ps1` wraps
the C18Z18 live route-quality/session-scoped path and verifies the parallel
window is enabled and observed while backend fallback and flow drops stay at
window is enabled and observed while compat fallback and flow drops stay at
zero. Result:
`artifacts/c18z19-service-channel-parallel-flow-window-smoke-result.json`
run `c18z14-20260508-084133`; 60 batches / 480 packets delivered,
`max_parallel_flow_sends=4`, `send_flow_parallel_batches=60`, served
channels `32`, session-scoped quality channels `32`, backend fallback `0`,
channels `32`, session-scoped quality channels `32`, compat fallback `0`,
flow drops `0`, temporary route intents expired.
- C18Z20 per-channel latency/retry/in-flight telemetry and adaptive recommended
send-window telemetry are implemented. Node-agent `0.2.195` is built,
@@ -1498,7 +1497,7 @@ Current implementation focus remains:
run `c18z14-20260508-085635`; 60 batches / 480 packets delivered,
`max_parallel_flow_sends=4`, `recommended_parallel_flow_sends=4`,
`scheduler_max_in_flight=4`, attempts/success/latency visible on 32 channels,
backend fallback `0`, flow drops `0`, temporary route intents expired.
compat fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z21 rolling per-channel/session quality windows are implemented.
Node-agent `0.2.196` is built, published to docker-test downloads,
registered in the stable update channel, and deployed to `test-1/2/3`;
@@ -1521,7 +1520,7 @@ Current implementation focus remains:
run `c18z14-20260508-091952`; 60 batches / 480 packets delivered,
scheduler quality-window samples `480`, failures `0`, drops `0`, window
samples/success/latency visible on 32 channels, `recommended_parallel_flow_sends=4`,
backend fallback `0`, flow drops `0`, temporary route intents expired.
compat fallback `0`, flow drops `0`, temporary route intents expired.
- C18Z22 backend durable route feedback now consumes the rolling quality
window from node-agent heartbeat metadata. Backend
`rap-backend:fabric-service-channel-0.2.197` is built and deployed on
@@ -1542,7 +1541,7 @@ Current implementation focus remains:
`artifacts/c18z22-service-channel-rolling-feedback-smoke-result.json` run
`c18z14-20260508-093100`; 60 batches / 480 packets delivered, route feedback
count `1`, rolling feedback count `1`, healthy rolling feedback count `1`,
rolling payload count `1`, backend fallback `0`, flow drops `0`.
rolling payload count `1`, compat fallback `0`, flow drops `0`.
- C18Z23 recovery hysteresis is implemented for recovered service-channel
routes. Backend `rap-backend:fabric-service-channel-0.2.198` is built and
deployed on docker-test; node-agent remains `0.2.196` on `test-1/2/3`.
@@ -1845,7 +1844,7 @@ Current implementation focus remains:
update. When a cluster authority public key is pinned, the node-agent now
rejects unsigned `rap_fsc_*` service-channel requests and requires the
signed `rap.fabric_service_channel_lease_authority.v1` payload/signature
headers. Legacy unsigned tokens remain accepted only in unpinned test mode.
headers. Compat-unsigned tokens remain accepted only in unpinned test mode.
Live smoke proved unsigned POST is rejected with 403 while signed lease POST
is accepted with 202:
`artifacts/c18z47-service-channel-signed-lease-enforcement-smoke-result.json`.
@@ -1860,7 +1859,7 @@ Current implementation focus remains:
`artifacts/c18z48-service-channel-introspection-smoke-result.json`.
- C18Z49 service-channel acceptance telemetry is implemented in node-agent
`0.2.232`. Each accepted Fabric Service Channel ingress records
`accepted_by=signed|introspection|legacy_unsigned`, route preference, and
`accepted_by=signed|introspection|compat_unsigned`, route preference, and
backend-fallback state in structured node logs. HTTP packet ingress also
returns `X-RAP-Service-Channel-Accepted-By` for smoke/diagnostics.
- C18Z50 durable service-channel lease introspection is implemented. Migration
@@ -1889,7 +1888,7 @@ Current implementation focus remains:
docker-test; node-agent/host-agent `0.2.235` artifacts are published under
`/downloads`, registered as active dev releases, and deployed on
`test-1/2/3`. Node-agent now reports accepted service-channel ingress
counters by `signed`, `introspection`, and `legacy_unsigned`, including
counters by `signed`, `introspection`, and `compat_unsigned`, including
backend-fallback count and last accepted timestamp. Backend exposes
`GET /clusters/{clusterID}/fabric/service-channels/access-telemetry`,
reading telemetry observations with heartbeat metadata fallback. Web-admin
@@ -1906,7 +1905,7 @@ Current implementation focus remains:
fallback, and latest route-quality feedback when a route exists. Web-admin's
`Service-channel access` panel now shows active channel rows before per-node
counters, so operators can see whether a live service channel is using normal
route quality feedback or degraded backend fallback. Live smoke created an
route quality feedback or degraded compat fallback. Live smoke created an
active lease, sent ingress traffic through test-1, and verified active
channel correlation plus fallback visibility:
`artifacts/c18z53-service-channel-access-correlation-smoke-result.json`.
@@ -1915,14 +1914,14 @@ Current implementation focus remains:
`vpn_packets` route intent, injects healthy route-quality heartbeat
telemetry, issues a service-channel lease that selects the normal primary
route, sends ingress traffic, and verifies the access telemetry active
channel row is `ready`, not backend fallback, with `route_feedback_status`
channel row is `ready`, not compat fallback, with `route_feedback_status`
`healthy`, rolling quality counters, and last send duration:
`artifacts/c18z54-service-channel-normal-route-access-smoke-result.json`.
- C18Z55 degraded normal-route access correlation is smoke-proven on the same
backend/admin surface. The smoke first issues a lease on a normal primary
`vpn_packets` route, then injects degraded/fenced route-quality heartbeat
feedback for that already-selected route. Access telemetry correctly reports
the active channel as `ready` and `force_backend_fallback=false`, while route
the active channel as `ready` and `force_compat_fallback=false`, while route
feedback is `fenced`, rolling failure/drop/slow counters are visible, and the
aggregate access status becomes `degraded` because `degraded_route_count > 0`:
`artifacts/c18z55-service-channel-degraded-route-access-smoke-result.json`.
@@ -1931,7 +1930,7 @@ Current implementation focus remains:
docker-test; node-agent remains `0.2.235`. Active access telemetry channel
rows now include `remediation_action`, `remediation_reason`,
`remediation_route_id`, `remediation_route_status`, and an operator hint.
Decisions distinguish explicit backend fallback, degraded/fenced normal
Decisions distinguish explicit compat fallback, degraded/fenced normal
route with an authorized alternate (`prefer_alternate_route`), degraded/fenced
route needing rebuild (`rebuild_route`), and healthy route (`none`).
Web-admin shows the remediation action in the `Service-channel access`
@@ -1942,7 +1941,7 @@ Current implementation focus remains:
creates primary and authorized alternate `vpn_packets` routes, issues a lease
while primary is still healthy/selected, then injects fenced feedback for the
selected primary. Access telemetry keeps the active channel on the normal
route with `force_backend_fallback=false`, reports `route_feedback_status`
route with `force_compat_fallback=false`, reports `route_feedback_status`
`fenced`, and recommends `remediation_action=prefer_alternate_route` with the
alternate route id/status; `degraded_fallback_channel_count` stays zero:
`artifacts/c18z56-service-channel-alternate-remediation-smoke-result.json`.
@@ -1976,14 +1975,14 @@ Current implementation focus remains:
remediation command is consumed, then verifies runtime heartbeat evidence:
`last_selected_route_id` and flow-scheduler `last_route_id` move to the
replacement route, `send_successes=1`, `send_failures=0`,
`send_fallback_local=0`, and no degraded backend fallback is recommended.
`send_fallback_local=0`, and no degraded compat fallback is recommended.
Result:
`artifacts/c18z59-service-channel-remediation-traffic-smoke-result.json`.
- C18Z60 multi-flow remediation traffic proof is smoke-proven. The smoke sends
a batch of twelve IPv4/TCP-like packets that classify into multiple
independent VPN flow channels after the remediation command is consumed.
Runtime heartbeat evidence shows the replacement route selected, at least two
flow-scheduler channels on that route, no local/backend fallback, no flow
flow-scheduler channels on that route, no local/compat fallback, no flow
drops, and no route send failures. Result:
`artifacts/c18z60-service-channel-remediation-multiflow-smoke-result.json`.
- C18Z61 pressure remediation traffic proof is smoke-proven. The smoke sends a
@@ -2000,7 +1999,7 @@ Current implementation focus remains:
old default bulk channel ids. Unit tests prove priority ordering
`control > interactive > reliable > bulk > droppable`; live smoke proves a
bulk 128-packet pressure batch plus an interactive packet both move through
the remediation replacement route with no local/backend fallback, drops, or
the remediation replacement route with no local/compat fallback, drops, or
route failures. Result:
`artifacts/c18z62-service-channel-remediation-qos-smoke-result.json`.
- C18Z63 concurrent QoS isolation is implemented and unit-proven. A controlled
@@ -2046,7 +2045,7 @@ Current implementation focus remains:
remediation. Run `c18z67-20260508-213452` accepted all 6 bulk requests,
forwarded 3072 post-remediation packets, completed the interactive request in
132 ms, observed 32 bulk and 12 interactive replacement-route flow stats, and
kept local/backend fallback, route failures, flow drops, and scheduler drops
kept local/compat fallback, route failures, flow drops, and scheduler drops
at 0. Artifact:
`artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
- C18Z68 service-channel flow-health guard is implemented and deployed on
@@ -2054,7 +2053,7 @@ Current implementation focus remains:
web-admin rebuilt/deployed. Access telemetry now projects
`flow_health_status` and `flow_health_reason` at cluster, node, and
active-channel levels from traffic-class counts, queue pressure, flow drops,
backend fallback, route-quality failures/drops/slow samples, and route send
compat fallback, route-quality failures/drops/slow samples, and route send
latency. Web-admin shows explicit flow-health chips beside flow QoS so
sustained bulk pressure, degraded latency, fallback, and drops are visible
before adding user services. Verification passed:
@@ -2114,7 +2113,7 @@ Current implementation focus remains:
`rap-node-agent:0.2.245-c18z71` on `test-1/2/3`. Backend exposes audited
`GET/PUT /clusters/{clusterID}/fabric/service-channels/pool-policy` for
entry/exit pool constraints, preferred entry/exit, selection strategy,
route/entry/exit failover modes, backend fallback allowance, and sticky
route/entry/exit failover modes, compat fallback allowance, and sticky
session mode. Lease issuance now applies the effective policy before route
selection, constrains `entry_pool`/`exit_pool`, chooses policy preferred
nodes when present, embeds `pool_policy` provenance in the lease, and signs
@@ -2258,7 +2257,7 @@ Current implementation focus remains:
`rebuild_route` command to `applied` / `replacement_selected`, the entry node
reports a route-manager decision for the same `rebuild_request_id`, reports
transition `applied_rebuild`, and live service-channel packet ingress selects
the replacement route with no local/backend fallback, route failures, or flow
the replacement route with no local/compat fallback, route failures, or flow
drops. Verification passed:
`go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
`go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
@@ -2455,12 +2454,12 @@ Current implementation focus remains:
If a signed data-plane contract has `backend_relay_policy=disabled`, the
service-channel runtime no longer proxies failed/missing fabric-route working
data through backend relay; it returns a visible service unavailable result.
The live smoke temporarily disables backend fallback in pool policy, issues a
The live smoke temporarily disables compat fallback in pool policy, issues a
no-route lease, verifies `backend_relay_policy=disabled`, posts to test-1,
and proves the node rejects with 503 instead of backend relay. Verification
passed: node-agent tests, C18Z92 live smoke, and C18Z91 regression smoke.
Artifact:
`artifacts/c18z92-node-agent-disabled-backend-fallback-smoke-result.json`.
`artifacts/c18z92-node-agent-disabled-compat-fallback-smoke-result.json`.
- C18Z93 access-telemetry data-plane projection is implemented and deployed on
docker-test as `rap-backend:fabric-service-channel-0.2.268-c18z93`;
node-agent remains `rap-node-agent:0.2.267-c18z92` on `test-1/2/3`, and
@@ -2493,7 +2492,7 @@ Current implementation focus remains:
docker-test as backend `rap-backend:fabric-service-channel-0.2.270-c18z95`
and node-agent `rap-node-agent:0.2.270-c18z95` on `test-1/2/3`; web-admin is
rebuilt/deployed to `rap_web_admin`. Node-agent now reports
`backend_fallback_blocked`, `fabric_route_send_failure`, and last data-plane
`compat_fallback_blocked`, `fabric_route_send_failure`, and last data-plane
violation status/reason in `fabric_service_channel_access_report`. Backend
access telemetry projects those fields to cluster, node, and active-channel
rows, and `data_plane_contract` incidents distinguish policy-blocked fallback
@@ -2505,7 +2504,7 @@ Current implementation focus remains:
docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
node-agent remains `rap-node-agent:0.2.270-c18z95` on `test-1/2/3`, and
web-admin remains deployed. Backend now converts heartbeat access reports
with `fabric_route_send_failed_backend_fallback_blocked` into durable fenced
with `fabric_route_send_failed_compat_fallback_blocked` into durable fenced
`fabric_service_channel_route_feedback` for the active channel primary route.
The existing route rebuild planner then selects an authorized replacement
route when one exists. Verification passed: backend tests, node-agent tests,
@@ -5402,3 +5401,4 @@ The current phase is NOT:
Future mesh, VPN, multi-cluster, node-agent updater, and production realtime data-plane work must be introduced only through explicit, narrow, staged implementation prompts.
Always keep the project production-oriented. Do not simplify it into a toy app.