рабочий вариант, но скороть 10 МБит
This commit is contained in:
+76
-76
@@ -95,11 +95,11 @@ Current audit and baseline snapshot:
|
||||
- Canonical test Docker host: `192.168.200.61`
|
||||
- Canonical Docker context: `test-ubuntu`
|
||||
- Canonical SSH alias: `docker-test`
|
||||
- Current external control-plane endpoint for remote/offsite node enrollment:
|
||||
- Current external fabric control endpoint for remote/offsite node enrollment:
|
||||
`http://94.141.118.222:19191` / `http://vpn.cin.su:19191`.
|
||||
- Current port forward: `94.141.118.222:19191` -> `192.168.200.61:18080`.
|
||||
- For offsite Windows/Linux nodes, install profiles should use:
|
||||
`http://vpn.cin.su:19191/api/v1` as control-plane endpoint and
|
||||
`http://vpn.cin.su:19191/api/v1` as fabric control endpoint and
|
||||
`http://vpn.cin.su:19191/downloads` as artifact endpoint unless the user
|
||||
explicitly chooses the raw IP endpoint.
|
||||
- Backend API for local/client smoke runs: `http://192.168.200.61:8080/api/v1`
|
||||
@@ -699,7 +699,7 @@ Current implementation focus remains:
|
||||
replace their local wrapper, after which automatic polling should continue.
|
||||
- Admin UI now marks missing host-agent updater reports as `repair updater` in
|
||||
the node list and explains in node details / Updates when to run the Windows
|
||||
repair command. The command uses the external control-plane endpoint and does
|
||||
repair command. The command uses the external fabric control endpoint and does
|
||||
not require a join token for already enrolled Windows nodes.
|
||||
- Admin UI node details / Updates also provides a ready downloadable
|
||||
`rap-repair-updater-<node>.cmd` plus copy-command action for Windows repair,
|
||||
@@ -716,29 +716,28 @@ Current implementation focus remains:
|
||||
Docker-test nodes `test-1/2/3` updated to `0.1.6`; existing Windows and
|
||||
off-host Docker nodes still need their local updater wrappers to pick up the
|
||||
0.1.6 host-agent repair path.
|
||||
- C17Z30 operator-configured public mesh endpoints are implemented and
|
||||
docker-test-deployed: desired `mesh-listener.advertise_endpoint` is now
|
||||
projected into peer endpoint candidates for other nodes and preferred over
|
||||
auto-discovered private heartbeat endpoints. `home-1`
|
||||
(`8ad04829-cd30-4290-913d-1ce5c7ef7bb3`) is configured with
|
||||
`listen_addr=0.0.0.0:19131`, `advertise_endpoint=http://94.141.118.222:19199`,
|
||||
`connectivity_mode=direct`, `nat_type=port_restricted`, `region=home`.
|
||||
`test-1` synthetic config now receives `home-1` peer endpoint
|
||||
`http://94.141.118.222:19199`; internal `192.168.200.85:19131` responds with
|
||||
HTTP 405 on GET, while external `94.141.118.222:19199` currently refuses TCP,
|
||||
so router/firewall forwarding still needs correction outside the platform.
|
||||
- C17Z31 offsite bootstrap peer selection is implemented and docker-test
|
||||
deployed: operator-configured public/direct desired mesh-listener endpoints
|
||||
are kept in core-mesh bootstrap even after the default warm-peer target is
|
||||
reached. This fixes the case where remote Windows node
|
||||
`ifcm-rufms-s-mo1cr` received only `test-*` warm peers and no `home-1`.
|
||||
Its synthetic config now includes `home-1` endpoint
|
||||
`http://94.141.118.222:19199` and candidates ordered as operator public,
|
||||
heartbeat advertised public, then private LAN converted to relay-required for
|
||||
offsite. External TCP to `94.141.118.222:19199` still failed from Codex and
|
||||
docker-test checks while internal `192.168.200.85:19131` succeeds, so a real
|
||||
offsite `Test-NetConnection 94.141.118.222 -Port 19199` is the next network
|
||||
validation.
|
||||
- C17Z30 operator-configured public mesh endpoints are implemented and
|
||||
docker-test-deployed: desired `fabric-listener.advertise_endpoint` is now
|
||||
projected into peer endpoint candidates for other nodes and preferred over
|
||||
auto-discovered private heartbeat endpoints. `home-1`
|
||||
(`8ad04829-cd30-4290-913d-1ce5c7ef7bb3`) is configured with
|
||||
`listen_addr=0.0.0.0:19131`, `advertise_endpoint=quic://94.141.118.222:19199`,
|
||||
`connectivity_mode=direct`, `nat_type=port_restricted`, `region=home`.
|
||||
`test-1` synthetic config now receives `home-1` peer endpoint
|
||||
`quic://94.141.118.222:19199`; internal `192.168.200.85:19131` responds on
|
||||
the fabric listener while external `94.141.118.222:19199` still needs UDP
|
||||
forwarding, so router/firewall correction remains outside the platform.
|
||||
- C17Z31 offsite bootstrap peer selection is implemented and docker-test
|
||||
deployed: operator-configured public/direct desired fabric-listener endpoints
|
||||
are kept in core-mesh bootstrap even after the default warm-peer target is
|
||||
reached. This fixes the case where remote Windows node
|
||||
`ifcm-rufms-s-mo1cr` received only `test-*` warm peers and no `home-1`.
|
||||
Its synthetic config now includes `home-1` endpoint
|
||||
`quic://94.141.118.222:19199` and candidates ordered as operator public,
|
||||
heartbeat advertised public, then private LAN converted to relay-required for
|
||||
offsite. External UDP reachability to `94.141.118.222:19199` still needs
|
||||
verification while internal `192.168.200.85:19131` succeeds, so the next
|
||||
network validation is an offsite QUIC/UDP probe against port `19199`.
|
||||
- C17Z32 native Ubuntu/Linux service install is implemented and docker-test
|
||||
deployed: backend exposes `/node-agents/linux-install-profile`, host-agent
|
||||
supports `install-linux`, installs `rap-node-agent` under
|
||||
@@ -751,7 +750,7 @@ Current implementation focus remains:
|
||||
install profile and generates profile-based `install-linux` commands.
|
||||
A one-use token for `vps-ubuntu-1` is active until 2026-05-02T08:41:41Z:
|
||||
`rap_join_a23Xhz63YstshWUBAPGPz5fzQ8YpHDP05RXaaYa4DoA`; scope roles are
|
||||
`core-mesh` and `relay-node`, control-plane endpoint is
|
||||
`core-mesh` and `relay-node`, fabric control endpoint is
|
||||
`http://vpn.cin.su:19191/api/v1`, artifact endpoint is
|
||||
`http://vpn.cin.su:19191/downloads`.
|
||||
- Admin UI and docs now cover the full Windows updater operational workflow:
|
||||
@@ -813,19 +812,19 @@ Current implementation focus remains:
|
||||
`usa-los-1` (`linux_binary`) and `ifcm-rufms-s-mo1cr` (`windows_service`) now
|
||||
return `action=update`, `target_version=0.2.40` instead of
|
||||
`no_matching_artifact`.
|
||||
- C18F production-forwarding gate work is partially live: backend
|
||||
`rap-backend:test-vpn-fabric-route-0.2.42` signs node synthetic configs with
|
||||
`production_forwarding=true` / `control_plane_only=false` when the node's
|
||||
desired `mesh-listener` workload has `production_forwarding_enabled=true`.
|
||||
`home-1` and `usa-los-1` desired mesh-listener configs have this flag enabled.
|
||||
- C18F production-forwarding gate work is partially live: backend
|
||||
`rap-backend:test-vpn-fabric-route-0.2.42` signs node synthetic configs with
|
||||
`production_forwarding=true` / `control_plane_only=false` when the node's
|
||||
desired `fabric-listener` workload has `production_forwarding_enabled=true`.
|
||||
`home-1` and `usa-los-1` desired fabric-listener configs have this flag enabled.
|
||||
Node-agent `0.2.44` accepts signed production-forwarding mesh configs and
|
||||
host-agent `0.2.44` fixes Docker updater behavior so synthetic mesh runtime is
|
||||
not disabled on Docker updates. Runtime status: `usa-los-1` reports
|
||||
`mesh_production_forwarding=true`; `home-1` reports `0.2.44` and synthetic
|
||||
runtime enabled, but its listener report is still `disabled/listen_addr_empty`,
|
||||
so `home-1` is not yet a usable production fabric endpoint. Next action is to
|
||||
repair why `home-1` is not applying the signed mesh-listener config
|
||||
(`listen_addr=0.0.0.0:19131`) after Docker updater restart.
|
||||
runtime enabled, but its listener report is still `disabled/listen_addr_empty`,
|
||||
so `home-1` is not yet a usable production fabric endpoint. Next action is to
|
||||
repair why `home-1` is not applying the signed fabric-listener config
|
||||
(`listen_addr=0.0.0.0:19131`) after Docker updater restart.
|
||||
- C18G VPN-over-fabric runtime path is live-tested on docker-test. Backend is
|
||||
deployed as `rap-backend:test-vpn-fabric-route-0.2.43`; VPN route intents now
|
||||
allow both `vpn_packet` data and `fabric_control` health probes. Node-agent
|
||||
@@ -923,7 +922,7 @@ Current implementation focus remains:
|
||||
`route_rebuild_recommended`, `degraded_fallback_recommended`, or repeated
|
||||
consecutive failures. Fenced routes are not selected as primary or alternate;
|
||||
if all selected entry/exit routes are fenced, the lease uses explicit
|
||||
degraded backend fallback with reason
|
||||
degraded compat fallback with reason
|
||||
`fabric_routes_fenced_by_service_channel_feedback`. Live smoke created two
|
||||
short-lived `test-1 -> test-2` route intents, injected a fresh
|
||||
service-channel flow feedback heartbeat marking the higher-priority route as
|
||||
@@ -1122,7 +1121,7 @@ Current implementation focus remains:
|
||||
smoke-passed on 2026-05-07. Node-agent/host-agent `0.2.182` artifacts,
|
||||
Docker image `rap-node-agent:0.2.182`, release manifests, and update
|
||||
policies are published. Backend `rap-backend:fabric-service-channel-0.2.182`
|
||||
is deployed on docker-test. The runtime fix is a dynamic mesh listener
|
||||
is deployed on docker-test. The runtime fix is a dynamic fabric listener
|
||||
handler: synthetic config refreshes now update `/mesh/v1/forward`,
|
||||
service-channel ingress, production routes, delivery inbox, and forward
|
||||
transport without requiring a port/listener restart. Backend route-feedback
|
||||
@@ -1157,7 +1156,7 @@ Current implementation focus remains:
|
||||
`rolling` target `0.2.183`, and the test containers run that image. The
|
||||
runtime fix makes the entry node honor the signed service-channel lease
|
||||
authority: leases with `status=degraded_fallback` or
|
||||
`primary_route.status=missing_route_intent` now force backend fallback instead
|
||||
`primary_route.status=missing_route_intent` now force compat fallback instead
|
||||
of reusing stale generic route candidates. The same fallback rule is applied
|
||||
to HTTP and WebSocket packet ingress. Script
|
||||
`scripts/fabric/c18z3-live-service-channel-entry-ws-fallback-smoke.ps1`
|
||||
@@ -1167,7 +1166,7 @@ Current implementation focus remains:
|
||||
expiry. Result:
|
||||
`artifacts/c18z3-live-service-channel-entry-ws-fallback-smoke-result.json`
|
||||
run `c18z3-20260507-211402`: warm `4/4`, WebSocket packets `8`, recovery
|
||||
`4/4`, backend fallback queue `0 -> 8`, route failures `0`, and all checks
|
||||
`4/4`, compat fallback queue `0 -> 8`, route failures `0`, and all checks
|
||||
passed. During publication the first `0.2.183` Docker tar had a malformed
|
||||
entrypoint and stale size/hash metadata; it was rebuilt, the latest tar alias
|
||||
was replaced, and the release artifact row was corrected to sha256
|
||||
@@ -1182,7 +1181,7 @@ Current implementation focus remains:
|
||||
refresh, and verifies the remaining packets use the alternate route. Result:
|
||||
`artifacts/c18z4-live-service-channel-session-pressure-smoke-result.json`
|
||||
run `c18z4-20260507-212748`: exit inbox depth `0 -> 384`, route failure delta
|
||||
`0`, flow drop delta `0`, backend fallback queue `0 -> 0`, primary route
|
||||
`0`, flow drop delta `0`, compat fallback queue `0 -> 0`, primary route
|
||||
removed from entry/exit configs, alternate route selected after the switch,
|
||||
and both route intents expired. This proves the shared Fabric Service Channel
|
||||
can keep a service session alive while Control Plane changes the live route
|
||||
@@ -1196,7 +1195,7 @@ Current implementation focus remains:
|
||||
traffic over the same WebSocket. Result:
|
||||
`artifacts/c18z5-live-service-channel-exit-restart-smoke-result.json` run
|
||||
`c18z5-20260507-213745`: pre/outage/recovery batches `12/24/24`, total
|
||||
packets `480`, route failure delta `48`, backend fallback queue `0 -> 192`,
|
||||
packets `480`, route failure delta `48`, compat fallback queue `0 -> 192`,
|
||||
flow drop delta `0`, and recovery exit inbox `0 -> 192`. This proves real
|
||||
exit-node failure is visible as fallback/failure telemetry while the
|
||||
long-lived service channel remains usable and fabric delivery resumes after
|
||||
@@ -1215,7 +1214,7 @@ Current implementation focus remains:
|
||||
`c18z6-20260507-214900`: pre/post batches `16/32`, total packets `384`,
|
||||
exit inbox depth `0 -> 384`, Control Plane replacement route
|
||||
`b2f3c510-46d2-4dce-8389-3952a99d0311`, route failure delta `0`, flow drop
|
||||
delta `0`, backend fallback queue `0 -> 0`, all checks passed, and all
|
||||
delta `0`, compat fallback queue `0 -> 0`, all checks passed, and all
|
||||
active nodes remained healthy/current on `0.2.183`. This proves a live
|
||||
service channel can apply a route-manager rebuild decision without rebuilding
|
||||
the service WebSocket.
|
||||
@@ -1230,7 +1229,7 @@ Current implementation focus remains:
|
||||
`artifacts/c18z7-live-service-channel-concurrent-isolation-smoke-result.json`
|
||||
run `c18z7-20260507-215727`: 3 sessions, 36 rounds, 288 packets per session,
|
||||
864 packets total, each session exit inbox depth `288`, total exit depth
|
||||
`864`, backend fallback delta `0`, route failure delta `0`, flow drop delta
|
||||
`864`, compat fallback delta `0`, route failure delta `0`, flow drop delta
|
||||
`0`, and all active nodes healthy/current on `0.2.183`. This proves rebuild
|
||||
and route-manager state are shared correctly without one active service
|
||||
session starving or poisoning the other concurrent sessions.
|
||||
@@ -1246,7 +1245,7 @@ Current implementation focus remains:
|
||||
run `c18z8-20260507-221347`: both interactive sessions delivered 192 packets
|
||||
each, the abusive flow reached scheduler high watermark `1024`, scheduled
|
||||
`1030` packets on the hottest channel, dropped `282` packets on that channel,
|
||||
produced backend fallback delta `0`, route failure delta `0`, and all active
|
||||
produced compat fallback delta `0`, route failure delta `0`, and all active
|
||||
nodes stayed healthy/current on `0.2.183`. This proves bounded backpressure is
|
||||
visible and isolated to the overloaded logical flow without starving other
|
||||
active service sessions.
|
||||
@@ -1267,7 +1266,7 @@ Current implementation focus remains:
|
||||
direct replacement. Result:
|
||||
`artifacts/c18z9-live-service-channel-route-pool-smoke-result.json` run
|
||||
`c18z9-20260507-224901`: 54 batches / 432 packets sent and delivered to exit,
|
||||
backend fallback delta `0`, route failure delta `0`, flow drop delta `0`, and
|
||||
compat fallback delta `0`, route failure delta `0`, flow drop delta `0`, and
|
||||
temporary route intents expired. Test containers `test-1/2/3` run
|
||||
`rap-node-agent:0.2.184`; `usa-los-1`, `home-1`, and
|
||||
`ifcm-rufms-s-mo1cr` remain healthy on `0.2.183` until their rollout policy is
|
||||
@@ -1292,7 +1291,7 @@ Current implementation focus remains:
|
||||
post-rebuild traffic reaches the alternate exit. Result:
|
||||
`artifacts/c18z10-live-service-channel-exit-pool-smoke-result.json` run
|
||||
`c18z10-20260507-232645`: 54 batches / 432 packets sent, primary exit queue
|
||||
`144`, alternate exit queue `288`, backend fallback `0`, route failure delta
|
||||
`144`, alternate exit queue `288`, compat fallback `0`, route failure delta
|
||||
`0`, flow drop delta `0`, decision source
|
||||
`service_channel_feedback_exit_pool_replacement`, and temporary route intents
|
||||
expired. Backend and `test-1/2/3` are running `0.2.185`; update plans now
|
||||
@@ -1317,7 +1316,7 @@ Current implementation focus remains:
|
||||
verifies a refreshed lease selects `test-3`, then sends 288 more packets
|
||||
through the alternate entry to the same exit. Result:
|
||||
`artifacts/c18z11-live-service-channel-entry-pool-smoke-result.json` run
|
||||
`c18z11-20260507-235341`: exit queue `432`, backend fallback `0`, route
|
||||
`c18z11-20260507-235341`: exit queue `432`, compat fallback `0`, route
|
||||
failure deltas `0/0`, flow drop deltas `0/0`, and temporary route intents
|
||||
expired. This is a lease refresh/reconnect contract for entry replacement;
|
||||
preserving a broken client-to-entry socket across an entry node outage is not
|
||||
@@ -1355,7 +1354,7 @@ Current implementation focus remains:
|
||||
`score_adjustment=90`), and a refreshed lease prefers that fast route over a
|
||||
newly introduced higher-priority relay candidate. Result:
|
||||
`artifacts/c18z13-live-service-channel-route-quality-smoke-result.json` run
|
||||
`c18z13-20260508-001610`; backend fallback `0`, flow drops `0`, temporary
|
||||
`c18z13-20260508-001610`; compat fallback `0`, flow drops `0`, temporary
|
||||
route intents expired. Published release id:
|
||||
`64effc62-18b6-4eeb-a1c9-f5fb8e251491`.
|
||||
- C18Z14 active-session route-quality preference is implemented. Backend
|
||||
@@ -1431,7 +1430,7 @@ Current implementation focus remains:
|
||||
`artifacts/c18z17-live-service-channel-quality-cleanup-smoke-result.json`
|
||||
run `c18z14-20260508-075750`; 60 batches / 480 packets delivered, active
|
||||
quality markers `32`, stale quality markers `0`, visible preferences `3`,
|
||||
backend fallback `0`, flow drops `0`, temporary route intents expired.
|
||||
compat fallback `0`, flow drops `0`, temporary route intents expired.
|
||||
- C18Z18 service-session-scoped flow scheduler memory is implemented.
|
||||
Node-agent `0.2.193` is built, published to docker-test downloads,
|
||||
registered in the stable update channel, and deployed to `test-1/2/3`;
|
||||
@@ -1448,12 +1447,12 @@ Current implementation focus remains:
|
||||
`scripts/fabric/c18z18-service-channel-session-scoped-fairness-smoke.ps1`
|
||||
wraps the live C18Z17 quality path and verifies served live channels are
|
||||
session-scoped, unscoped served `flow-NN` channels are absent, quality
|
||||
markers are session-scoped, backend fallback is `0`, and flow drops are `0`.
|
||||
markers are session-scoped, compat fallback is `0`, and flow drops are `0`.
|
||||
Result:
|
||||
`artifacts/c18z18-service-channel-session-scoped-fairness-smoke-result.json`
|
||||
run `c18z14-20260508-082520`; 60 batches / 480 packets delivered, served
|
||||
channels `32`, session-scoped served channels `32`, session-scoped quality
|
||||
channels `32`, unscoped served channels `0`, backend fallback `0`, flow drops
|
||||
channels `32`, unscoped served channels `0`, compat fallback `0`, flow drops
|
||||
`0`, temporary route intents expired.
|
||||
- C18Z19 bounded parallel logical-flow send window is implemented. Node-agent
|
||||
`0.2.194` is built, published to docker-test downloads, registered in the
|
||||
@@ -1469,12 +1468,12 @@ Current implementation focus remains:
|
||||
Live script
|
||||
`scripts/fabric/c18z19-service-channel-parallel-flow-window-smoke.ps1` wraps
|
||||
the C18Z18 live route-quality/session-scoped path and verifies the parallel
|
||||
window is enabled and observed while backend fallback and flow drops stay at
|
||||
window is enabled and observed while compat fallback and flow drops stay at
|
||||
zero. Result:
|
||||
`artifacts/c18z19-service-channel-parallel-flow-window-smoke-result.json`
|
||||
run `c18z14-20260508-084133`; 60 batches / 480 packets delivered,
|
||||
`max_parallel_flow_sends=4`, `send_flow_parallel_batches=60`, served
|
||||
channels `32`, session-scoped quality channels `32`, backend fallback `0`,
|
||||
channels `32`, session-scoped quality channels `32`, compat fallback `0`,
|
||||
flow drops `0`, temporary route intents expired.
|
||||
- C18Z20 per-channel latency/retry/in-flight telemetry and adaptive recommended
|
||||
send-window telemetry are implemented. Node-agent `0.2.195` is built,
|
||||
@@ -1498,7 +1497,7 @@ Current implementation focus remains:
|
||||
run `c18z14-20260508-085635`; 60 batches / 480 packets delivered,
|
||||
`max_parallel_flow_sends=4`, `recommended_parallel_flow_sends=4`,
|
||||
`scheduler_max_in_flight=4`, attempts/success/latency visible on 32 channels,
|
||||
backend fallback `0`, flow drops `0`, temporary route intents expired.
|
||||
compat fallback `0`, flow drops `0`, temporary route intents expired.
|
||||
- C18Z21 rolling per-channel/session quality windows are implemented.
|
||||
Node-agent `0.2.196` is built, published to docker-test downloads,
|
||||
registered in the stable update channel, and deployed to `test-1/2/3`;
|
||||
@@ -1521,7 +1520,7 @@ Current implementation focus remains:
|
||||
run `c18z14-20260508-091952`; 60 batches / 480 packets delivered,
|
||||
scheduler quality-window samples `480`, failures `0`, drops `0`, window
|
||||
samples/success/latency visible on 32 channels, `recommended_parallel_flow_sends=4`,
|
||||
backend fallback `0`, flow drops `0`, temporary route intents expired.
|
||||
compat fallback `0`, flow drops `0`, temporary route intents expired.
|
||||
- C18Z22 backend durable route feedback now consumes the rolling quality
|
||||
window from node-agent heartbeat metadata. Backend
|
||||
`rap-backend:fabric-service-channel-0.2.197` is built and deployed on
|
||||
@@ -1542,7 +1541,7 @@ Current implementation focus remains:
|
||||
`artifacts/c18z22-service-channel-rolling-feedback-smoke-result.json` run
|
||||
`c18z14-20260508-093100`; 60 batches / 480 packets delivered, route feedback
|
||||
count `1`, rolling feedback count `1`, healthy rolling feedback count `1`,
|
||||
rolling payload count `1`, backend fallback `0`, flow drops `0`.
|
||||
rolling payload count `1`, compat fallback `0`, flow drops `0`.
|
||||
- C18Z23 recovery hysteresis is implemented for recovered service-channel
|
||||
routes. Backend `rap-backend:fabric-service-channel-0.2.198` is built and
|
||||
deployed on docker-test; node-agent remains `0.2.196` on `test-1/2/3`.
|
||||
@@ -1845,7 +1844,7 @@ Current implementation focus remains:
|
||||
update. When a cluster authority public key is pinned, the node-agent now
|
||||
rejects unsigned `rap_fsc_*` service-channel requests and requires the
|
||||
signed `rap.fabric_service_channel_lease_authority.v1` payload/signature
|
||||
headers. Legacy unsigned tokens remain accepted only in unpinned test mode.
|
||||
headers. Compat-unsigned tokens remain accepted only in unpinned test mode.
|
||||
Live smoke proved unsigned POST is rejected with 403 while signed lease POST
|
||||
is accepted with 202:
|
||||
`artifacts/c18z47-service-channel-signed-lease-enforcement-smoke-result.json`.
|
||||
@@ -1860,7 +1859,7 @@ Current implementation focus remains:
|
||||
`artifacts/c18z48-service-channel-introspection-smoke-result.json`.
|
||||
- C18Z49 service-channel acceptance telemetry is implemented in node-agent
|
||||
`0.2.232`. Each accepted Fabric Service Channel ingress records
|
||||
`accepted_by=signed|introspection|legacy_unsigned`, route preference, and
|
||||
`accepted_by=signed|introspection|compat_unsigned`, route preference, and
|
||||
backend-fallback state in structured node logs. HTTP packet ingress also
|
||||
returns `X-RAP-Service-Channel-Accepted-By` for smoke/diagnostics.
|
||||
- C18Z50 durable service-channel lease introspection is implemented. Migration
|
||||
@@ -1889,7 +1888,7 @@ Current implementation focus remains:
|
||||
docker-test; node-agent/host-agent `0.2.235` artifacts are published under
|
||||
`/downloads`, registered as active dev releases, and deployed on
|
||||
`test-1/2/3`. Node-agent now reports accepted service-channel ingress
|
||||
counters by `signed`, `introspection`, and `legacy_unsigned`, including
|
||||
counters by `signed`, `introspection`, and `compat_unsigned`, including
|
||||
backend-fallback count and last accepted timestamp. Backend exposes
|
||||
`GET /clusters/{clusterID}/fabric/service-channels/access-telemetry`,
|
||||
reading telemetry observations with heartbeat metadata fallback. Web-admin
|
||||
@@ -1906,7 +1905,7 @@ Current implementation focus remains:
|
||||
fallback, and latest route-quality feedback when a route exists. Web-admin's
|
||||
`Service-channel access` panel now shows active channel rows before per-node
|
||||
counters, so operators can see whether a live service channel is using normal
|
||||
route quality feedback or degraded backend fallback. Live smoke created an
|
||||
route quality feedback or degraded compat fallback. Live smoke created an
|
||||
active lease, sent ingress traffic through test-1, and verified active
|
||||
channel correlation plus fallback visibility:
|
||||
`artifacts/c18z53-service-channel-access-correlation-smoke-result.json`.
|
||||
@@ -1915,14 +1914,14 @@ Current implementation focus remains:
|
||||
`vpn_packets` route intent, injects healthy route-quality heartbeat
|
||||
telemetry, issues a service-channel lease that selects the normal primary
|
||||
route, sends ingress traffic, and verifies the access telemetry active
|
||||
channel row is `ready`, not backend fallback, with `route_feedback_status`
|
||||
channel row is `ready`, not compat fallback, with `route_feedback_status`
|
||||
`healthy`, rolling quality counters, and last send duration:
|
||||
`artifacts/c18z54-service-channel-normal-route-access-smoke-result.json`.
|
||||
- C18Z55 degraded normal-route access correlation is smoke-proven on the same
|
||||
backend/admin surface. The smoke first issues a lease on a normal primary
|
||||
`vpn_packets` route, then injects degraded/fenced route-quality heartbeat
|
||||
feedback for that already-selected route. Access telemetry correctly reports
|
||||
the active channel as `ready` and `force_backend_fallback=false`, while route
|
||||
the active channel as `ready` and `force_compat_fallback=false`, while route
|
||||
feedback is `fenced`, rolling failure/drop/slow counters are visible, and the
|
||||
aggregate access status becomes `degraded` because `degraded_route_count > 0`:
|
||||
`artifacts/c18z55-service-channel-degraded-route-access-smoke-result.json`.
|
||||
@@ -1931,7 +1930,7 @@ Current implementation focus remains:
|
||||
docker-test; node-agent remains `0.2.235`. Active access telemetry channel
|
||||
rows now include `remediation_action`, `remediation_reason`,
|
||||
`remediation_route_id`, `remediation_route_status`, and an operator hint.
|
||||
Decisions distinguish explicit backend fallback, degraded/fenced normal
|
||||
Decisions distinguish explicit compat fallback, degraded/fenced normal
|
||||
route with an authorized alternate (`prefer_alternate_route`), degraded/fenced
|
||||
route needing rebuild (`rebuild_route`), and healthy route (`none`).
|
||||
Web-admin shows the remediation action in the `Service-channel access`
|
||||
@@ -1942,7 +1941,7 @@ Current implementation focus remains:
|
||||
creates primary and authorized alternate `vpn_packets` routes, issues a lease
|
||||
while primary is still healthy/selected, then injects fenced feedback for the
|
||||
selected primary. Access telemetry keeps the active channel on the normal
|
||||
route with `force_backend_fallback=false`, reports `route_feedback_status`
|
||||
route with `force_compat_fallback=false`, reports `route_feedback_status`
|
||||
`fenced`, and recommends `remediation_action=prefer_alternate_route` with the
|
||||
alternate route id/status; `degraded_fallback_channel_count` stays zero:
|
||||
`artifacts/c18z56-service-channel-alternate-remediation-smoke-result.json`.
|
||||
@@ -1976,14 +1975,14 @@ Current implementation focus remains:
|
||||
remediation command is consumed, then verifies runtime heartbeat evidence:
|
||||
`last_selected_route_id` and flow-scheduler `last_route_id` move to the
|
||||
replacement route, `send_successes=1`, `send_failures=0`,
|
||||
`send_fallback_local=0`, and no degraded backend fallback is recommended.
|
||||
`send_fallback_local=0`, and no degraded compat fallback is recommended.
|
||||
Result:
|
||||
`artifacts/c18z59-service-channel-remediation-traffic-smoke-result.json`.
|
||||
- C18Z60 multi-flow remediation traffic proof is smoke-proven. The smoke sends
|
||||
a batch of twelve IPv4/TCP-like packets that classify into multiple
|
||||
independent VPN flow channels after the remediation command is consumed.
|
||||
Runtime heartbeat evidence shows the replacement route selected, at least two
|
||||
flow-scheduler channels on that route, no local/backend fallback, no flow
|
||||
flow-scheduler channels on that route, no local/compat fallback, no flow
|
||||
drops, and no route send failures. Result:
|
||||
`artifacts/c18z60-service-channel-remediation-multiflow-smoke-result.json`.
|
||||
- C18Z61 pressure remediation traffic proof is smoke-proven. The smoke sends a
|
||||
@@ -2000,7 +1999,7 @@ Current implementation focus remains:
|
||||
old default bulk channel ids. Unit tests prove priority ordering
|
||||
`control > interactive > reliable > bulk > droppable`; live smoke proves a
|
||||
bulk 128-packet pressure batch plus an interactive packet both move through
|
||||
the remediation replacement route with no local/backend fallback, drops, or
|
||||
the remediation replacement route with no local/compat fallback, drops, or
|
||||
route failures. Result:
|
||||
`artifacts/c18z62-service-channel-remediation-qos-smoke-result.json`.
|
||||
- C18Z63 concurrent QoS isolation is implemented and unit-proven. A controlled
|
||||
@@ -2046,7 +2045,7 @@ Current implementation focus remains:
|
||||
remediation. Run `c18z67-20260508-213452` accepted all 6 bulk requests,
|
||||
forwarded 3072 post-remediation packets, completed the interactive request in
|
||||
132 ms, observed 32 bulk and 12 interactive replacement-route flow stats, and
|
||||
kept local/backend fallback, route failures, flow drops, and scheduler drops
|
||||
kept local/compat fallback, route failures, flow drops, and scheduler drops
|
||||
at 0. Artifact:
|
||||
`artifacts/c18z67-service-channel-concurrent-qos-live-smoke-result.json`.
|
||||
- C18Z68 service-channel flow-health guard is implemented and deployed on
|
||||
@@ -2054,7 +2053,7 @@ Current implementation focus remains:
|
||||
web-admin rebuilt/deployed. Access telemetry now projects
|
||||
`flow_health_status` and `flow_health_reason` at cluster, node, and
|
||||
active-channel levels from traffic-class counts, queue pressure, flow drops,
|
||||
backend fallback, route-quality failures/drops/slow samples, and route send
|
||||
compat fallback, route-quality failures/drops/slow samples, and route send
|
||||
latency. Web-admin shows explicit flow-health chips beside flow QoS so
|
||||
sustained bulk pressure, degraded latency, fallback, and drops are visible
|
||||
before adding user services. Verification passed:
|
||||
@@ -2114,7 +2113,7 @@ Current implementation focus remains:
|
||||
`rap-node-agent:0.2.245-c18z71` on `test-1/2/3`. Backend exposes audited
|
||||
`GET/PUT /clusters/{clusterID}/fabric/service-channels/pool-policy` for
|
||||
entry/exit pool constraints, preferred entry/exit, selection strategy,
|
||||
route/entry/exit failover modes, backend fallback allowance, and sticky
|
||||
route/entry/exit failover modes, compat fallback allowance, and sticky
|
||||
session mode. Lease issuance now applies the effective policy before route
|
||||
selection, constrains `entry_pool`/`exit_pool`, chooses policy preferred
|
||||
nodes when present, embeds `pool_policy` provenance in the lease, and signs
|
||||
@@ -2258,7 +2257,7 @@ Current implementation focus remains:
|
||||
`rebuild_route` command to `applied` / `replacement_selected`, the entry node
|
||||
reports a route-manager decision for the same `rebuild_request_id`, reports
|
||||
transition `applied_rebuild`, and live service-channel packet ingress selects
|
||||
the replacement route with no local/backend fallback, route failures, or flow
|
||||
the replacement route with no local/compat fallback, route failures, or flow
|
||||
drops. Verification passed:
|
||||
`go test ./internal/modules/cluster ./internal/platform/runtime ./internal/modules/nodeagent`,
|
||||
`go test ./cmd/rap-node-agent ./internal/agent ./internal/mesh ./internal/vpnruntime ./internal/config`,
|
||||
@@ -2455,12 +2454,12 @@ Current implementation focus remains:
|
||||
If a signed data-plane contract has `backend_relay_policy=disabled`, the
|
||||
service-channel runtime no longer proxies failed/missing fabric-route working
|
||||
data through backend relay; it returns a visible service unavailable result.
|
||||
The live smoke temporarily disables backend fallback in pool policy, issues a
|
||||
The live smoke temporarily disables compat fallback in pool policy, issues a
|
||||
no-route lease, verifies `backend_relay_policy=disabled`, posts to test-1,
|
||||
and proves the node rejects with 503 instead of backend relay. Verification
|
||||
passed: node-agent tests, C18Z92 live smoke, and C18Z91 regression smoke.
|
||||
Artifact:
|
||||
`artifacts/c18z92-node-agent-disabled-backend-fallback-smoke-result.json`.
|
||||
`artifacts/c18z92-node-agent-disabled-compat-fallback-smoke-result.json`.
|
||||
- C18Z93 access-telemetry data-plane projection is implemented and deployed on
|
||||
docker-test as `rap-backend:fabric-service-channel-0.2.268-c18z93`;
|
||||
node-agent remains `rap-node-agent:0.2.267-c18z92` on `test-1/2/3`, and
|
||||
@@ -2493,7 +2492,7 @@ Current implementation focus remains:
|
||||
docker-test as backend `rap-backend:fabric-service-channel-0.2.270-c18z95`
|
||||
and node-agent `rap-node-agent:0.2.270-c18z95` on `test-1/2/3`; web-admin is
|
||||
rebuilt/deployed to `rap_web_admin`. Node-agent now reports
|
||||
`backend_fallback_blocked`, `fabric_route_send_failure`, and last data-plane
|
||||
`compat_fallback_blocked`, `fabric_route_send_failure`, and last data-plane
|
||||
violation status/reason in `fabric_service_channel_access_report`. Backend
|
||||
access telemetry projects those fields to cluster, node, and active-channel
|
||||
rows, and `data_plane_contract` incidents distinguish policy-blocked fallback
|
||||
@@ -2505,7 +2504,7 @@ Current implementation focus remains:
|
||||
docker-test as backend `rap-backend:fabric-service-channel-0.2.281-c18z109`;
|
||||
node-agent remains `rap-node-agent:0.2.270-c18z95` on `test-1/2/3`, and
|
||||
web-admin remains deployed. Backend now converts heartbeat access reports
|
||||
with `fabric_route_send_failed_backend_fallback_blocked` into durable fenced
|
||||
with `fabric_route_send_failed_compat_fallback_blocked` into durable fenced
|
||||
`fabric_service_channel_route_feedback` for the active channel primary route.
|
||||
The existing route rebuild planner then selects an authorized replacement
|
||||
route when one exists. Verification passed: backend tests, node-agent tests,
|
||||
@@ -5402,3 +5401,4 @@ The current phase is NOT:
|
||||
Future mesh, VPN, multi-cluster, node-agent updater, and production realtime data-plane work must be introduced only through explicit, narrow, staged implementation prompts.
|
||||
|
||||
Always keep the project production-oriented. Do not simplify it into a toy app.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user