рабочий вариант, но скороть 10 МБит
This commit is contained in:
@@ -62,7 +62,7 @@ Cluster Authority foundation is now also complete:
|
||||
- cluster authority private keys are encrypted at rest when
|
||||
`SECRET_ENCRYPTION_KEY_B64`/file is configured; production already requires
|
||||
a secret encryption key
|
||||
- legacy/default clusters are backfilled lazily through `EnsureClusterAuthority`
|
||||
- compat/default clusters are backfilled lazily through `EnsureClusterAuthority`
|
||||
- backend signs join-token scope material, node approval/bootstrap material,
|
||||
and node-scoped synthetic mesh config snapshots
|
||||
- node-agent verifies signed Control Plane synthetic config when
|
||||
@@ -80,14 +80,14 @@ Cluster Authority foundation is now also complete:
|
||||
`RAP_WORKLOAD_SUPERVISION_ENABLED=false` by default while service runtime
|
||||
supervision remains a stub
|
||||
|
||||
Node enrollment bootstrap polling is also complete:
|
||||
|
||||
- backend exposes `/node-agents/enrollments/{requestID}/bootstrap`
|
||||
- pending agents prove `cluster_id`, `node_fingerprint`, and `public_key`
|
||||
before receiving status/bootstrap material
|
||||
- `rap-node-agent` stores `pending_join_request_id`, polls approval, verifies
|
||||
the signed bootstrap contract, then persists `node_id`, `identity_status`,
|
||||
and cluster authority pin into `identity.json`
|
||||
Node enrollment join polling is also complete:
|
||||
|
||||
- backend exposes `/node-agents/enrollments/{requestID}/join`
|
||||
- pending agents prove `cluster_id`, `node_fingerprint`, and `public_key`
|
||||
before receiving status/join material
|
||||
- `rap-node-agent` stores `pending_join_request_id`, polls approval, verifies
|
||||
the signed join contract, then persists `node_id`, `identity_status`,
|
||||
and cluster authority pin into `identity.json`
|
||||
- polling is controlled by `RAP_ENROLLMENT_POLL_INTERVAL_SECONDS` and
|
||||
`RAP_ENROLLMENT_POLL_TIMEOUT_SECONDS`
|
||||
|
||||
@@ -157,14 +157,16 @@ Runtime report:
|
||||
- `artifacts/c18z15-live-service-channel-effective-quality-smoke-result.json`
|
||||
- `artifacts/c18z16-live-service-channel-quality-fairness-smoke-result.json`
|
||||
- `artifacts/c18z17-live-service-channel-quality-cleanup-smoke-result.json`
|
||||
- `artifacts/c18z18-service-channel-session-scoped-fairness-smoke-result.json`
|
||||
- `artifacts/c18z19-service-channel-parallel-flow-window-smoke-result.json`
|
||||
- `artifacts/c18z20-service-channel-adaptive-window-telemetry-smoke-result.json`
|
||||
- Docker-test smoke command:
|
||||
`pwsh -NoProfile -ExecutionPolicy Bypass -File scripts\fabric\c17z12-rendezvous-relay-smoke-ssh.ps1 -KeepRunning`
|
||||
- Dev lifecycle smoke command:
|
||||
`pwsh -NoProfile -ExecutionPolicy Bypass -File scripts\fabric\dev-cluster-enrollment-bootstrap-smoke-ssh.ps1 -KeepRunning`
|
||||
- Last proven runtime run: `c17z18-20260428-221601` (legacy smoke script name,
|
||||
- `artifacts/c18z18-service-channel-session-scoped-fairness-smoke-result.json`
|
||||
- `artifacts/c18z19-service-channel-parallel-flow-window-smoke-result.json`
|
||||
- `artifacts/c18z20-service-channel-adaptive-window-telemetry-smoke-result.json`
|
||||
- Active fabric standard check:
|
||||
`pwsh -NoProfile -ExecutionPolicy Bypass -File scripts\check-fabric-standard-boundary.ps1`
|
||||
- Removed docker-test smoke record:
|
||||
`removed docker-test smoke script is not part of the active tree`
|
||||
- Removed dev lifecycle smoke record:
|
||||
`removed dev lifecycle smoke script is not part of the active tree`
|
||||
- Last proven runtime run: `c17z18-20260428-221601` (compat smoke script name,
|
||||
current C17Z20 node-agent code)
|
||||
- Last proven dev lifecycle run: `dev-bootstrap-20260428-201430`
|
||||
- Admin: `http://192.168.200.61:18080/`
|
||||
@@ -193,30 +195,30 @@ Node-agent image `rap-node-agent:0.2.270-c18z95` is built and deployed on
|
||||
`test-1/2/3`; web-admin is rebuilt and deployed to `rap_web_admin`.
|
||||
All three test nodes run the C18Z92 image, healthy, and current after policy
|
||||
update. Node-agent still requires signed service-channel lease authority when
|
||||
cluster authority is pinned, but if legacy clients cannot send signed lease
|
||||
cluster authority is pinned, but if compat clients cannot send signed lease
|
||||
headers it now calls backend introspection before accepting the unsigned token.
|
||||
Accepted ingress is visible as `accepted_by=signed|introspection|legacy_unsigned`
|
||||
Accepted ingress is visible as `accepted_by=signed|introspection|compat_unsigned`
|
||||
in structured node logs and via `X-RAP-Service-Channel-Accepted-By` on HTTP
|
||||
packet ingress. Durable introspection stores only `token_hash` plus a scrubbed
|
||||
lease payload, so backend restarts no longer break compatibility clients. Live
|
||||
lease maintenance now lists active/expired durable compatibility leases and runs
|
||||
bounded cleanup through the admin API/panel. Durable access telemetry now
|
||||
aggregates node-reported accepted ingress counters by signed/introspection/
|
||||
legacy path, with heartbeat metadata fallback and admin-panel visibility.
|
||||
compat path, with heartbeat metadata fallback and admin-panel visibility.
|
||||
Access telemetry now also correlates active durable service-channel leases with
|
||||
entry/exit nodes, primary route status, backend fallback, and latest
|
||||
entry/exit nodes, primary route status, compat fallback, and latest
|
||||
route-quality feedback when a route exists. Normal-route access diagnostics are
|
||||
smoke-proven with a temporary direct `vpn_packets` route and healthy rolling
|
||||
quality window. Degraded normal-route diagnostics are also smoke-proven: the
|
||||
active channel stays on a normal primary route with `force_backend_fallback=false`
|
||||
active channel stays on a normal primary route with `force_compat_fallback=false`
|
||||
while route feedback becomes `fenced` and rolling failure/drop/slow counters are
|
||||
visible. Active-channel remediation diagnostics now expose
|
||||
`remediation_action`, reason, optional alternate route id/status, and operator
|
||||
hint, with unit coverage for healthy/noop, rebuild, backend fallback, and
|
||||
hint, with unit coverage for healthy/noop, rebuild, compat fallback, and
|
||||
authorized alternate decisions. The alternate-route remediation branch is now
|
||||
live-smoke-proven: a selected primary route is degraded after lease issuance and
|
||||
access telemetry recommends `prefer_alternate_route` while keeping
|
||||
`force_backend_fallback=false`. C18Z57 turns that recommendation into a bounded
|
||||
`force_compat_fallback=false`. C18Z57 turns that recommendation into a bounded
|
||||
machine-readable `remediation_command` on the active channel row, including the
|
||||
primary route, replacement route, issued time, and command TTL capped to the
|
||||
lease lifetime. C18Z58 projects those commands into node-scoped synthetic mesh
|
||||
@@ -225,10 +227,10 @@ route-manager `applied` decision with source
|
||||
`service_channel_remediation_command`. C18Z59 proves active traffic follows the
|
||||
replacement route after remediation: runtime heartbeat evidence shows
|
||||
`last_selected_route_id` and flow-scheduler `last_route_id` on the replacement
|
||||
route, with no local/backend fallback and no route send failures. C18Z60 proves
|
||||
route, with no local/compat fallback and no route send failures. C18Z60 proves
|
||||
the same replacement path under multiple independent VPN flow channels: a
|
||||
twelve-packet batch is classified across multiple flow-scheduler channels, all
|
||||
observed replacement-route sends avoid local/backend fallback, flow drops, and
|
||||
observed replacement-route sends avoid local/compat fallback, flow drops, and
|
||||
route failures. C18Z61 raises that to a pressure batch of 128 IPv4/TCP-like
|
||||
packets; runtime evidence shows 32 replacement-route flow stats, scheduler
|
||||
high-watermark 5, max-in-flight 4, no fallback, no drops, and no route failures.
|
||||
@@ -260,7 +262,7 @@ fallback, route failures, flow drops, and scheduler drops stayed at 0. C18Z68
|
||||
adds backend/admin flow-health guard diagnostics over that telemetry:
|
||||
`flow_health_status` and `flow_health_reason` are projected at cluster, node,
|
||||
and active-channel levels from traffic-class pressure, queue pressure, flow
|
||||
drops, backend fallback, route-quality failures/drops/slow samples, and route
|
||||
drops, compat fallback, route-quality failures/drops/slow samples, and route
|
||||
send latency. Web-admin now shows flow-health chips beside flow QoS.
|
||||
C18Z69 adds node-side adaptive response: heartbeat flow-scheduler snapshots now
|
||||
report per-class `recommended_parallel_windows` plus
|
||||
@@ -319,7 +321,7 @@ C18Z79 closes the planner-to-runtime proof loop for that branch: after planner
|
||||
resolution, the entry node reports a route-manager decision with the same
|
||||
`rebuild_request_id`, the transition is `applied_rebuild`, and live
|
||||
service-channel packet traffic selects the replacement route without
|
||||
local/backend fallback, route failures, or flow drops. C18Z80 hardens that
|
||||
local/compat fallback, route failures, or flow drops. C18Z80 hardens that
|
||||
same path under sustained pressure: after planner-applied rebuild, five
|
||||
post-rebuild bursts of mixed `interactive`, `bulk`, and `reliable` VPN packet
|
||||
batches stay on the replacement route, the stale primary is not reselected, and
|
||||
@@ -396,7 +398,7 @@ C18Z91 makes node-agent consume that signed/introspected data-plane contract.
|
||||
Service-channel packet ingress validates the contract, applies the preferred
|
||||
fabric route, emits data-plane mode/transport/fallback/logical-flow fields in
|
||||
access logs, and reports contract adoption in heartbeat access telemetry.
|
||||
C18Z92 enforces disabled backend fallback policy at node-agent runtime: when a
|
||||
C18Z92 enforces disabled compat fallback policy at node-agent runtime: when a
|
||||
signed lease says `backend_relay_policy=disabled`, route failure or missing
|
||||
fabric route returns a visible 503 instead of silently proxying working data
|
||||
through backend relay.
|
||||
@@ -414,13 +416,13 @@ can now surface a recommended action such as restoring the fabric route instead
|
||||
of treating backend relay as normal service traffic.
|
||||
C18Z95 adds node-agent blocked-fallback telemetry. When a signed data-plane
|
||||
contract disables backend relay and the entry runtime cannot use a fabric
|
||||
route, node-agent reports `backend_fallback_blocked`, the last data-plane
|
||||
route, node-agent reports `compat_fallback_blocked`, the last data-plane
|
||||
violation status/reason, and backend/admin project those fields to cluster,
|
||||
node, channel, and `data_plane_contract` incident diagnostics. Disabled-policy
|
||||
refusal is now separate from real backend relay usage.
|
||||
C18Z96 wires normal-route send failure with disabled backend relay into the
|
||||
existing route feedback and rebuild planner path. When heartbeat access
|
||||
telemetry reports `fabric_route_send_failed_backend_fallback_blocked`, backend
|
||||
telemetry reports `fabric_route_send_failed_compat_fallback_blocked`, compat
|
||||
correlates the entry node's active service-channel leases, records fenced
|
||||
`fabric_service_channel_route_feedback` for the selected primary route, and the
|
||||
existing planner can select an alternate/replacement route. This keeps blocked
|
||||
@@ -570,7 +572,7 @@ artifacts:
|
||||
`artifacts/c18z89-service-channel-access-decision-resurface-action-smoke-result.json`, and
|
||||
`artifacts/c18z90-service-channel-data-plane-contract-smoke-result.json`, and
|
||||
`artifacts/c18z91-node-agent-data-plane-contract-enforcement-smoke-result.json`, and
|
||||
`artifacts/c18z92-node-agent-disabled-backend-fallback-smoke-result.json`, and
|
||||
`artifacts/c18z92-node-agent-disabled-compat-fallback-smoke-result.json`, and
|
||||
`artifacts/c18z93-access-telemetry-data-plane-contract-smoke-result.json`, and
|
||||
`artifacts/c18z94-data-plane-contract-incident-smoke-result.json`, and
|
||||
`artifacts/c18z95-node-agent-blocked-fallback-telemetry-smoke-result.json`, and
|
||||
|
||||
Reference in New Issue
Block a user