This commit is contained in:
2026-05-18 21:33:39 +03:00
parent 5096155d83
commit 469fa0e860
94 changed files with 8761 additions and 8003 deletions
@@ -256,9 +256,11 @@ The first backend contract slice is implemented:
observations, and degraded backend relay usage. These incidents keep backend
relay visible as degraded compatibility behavior rather than hidden steady
state.
- Node-agent access telemetry distinguishes backend relay actually used from
backend relay blocked by signed data-plane policy. Blocked fallback reports
include `backend_fallback_blocked` and the last violation status/reason, and
- Node-agent access telemetry distinguishes degraded compatibility requested
from degraded compatibility blocked by signed data-plane policy. Blocked
compatibility reports include `degraded_compatibility_blocked` and the last
violation status/reason, while preserving the original raw violation code in
a separate field for historical correlation, and
backend projects them to access telemetry plus `data_plane_contract`
incidents.
- Backend correlates access-report send failures with active service-channel
@@ -421,8 +423,8 @@ The first backend contract slice is implemented:
keeps failing outside manual retry cooldown creates a bounded rebuild
request. If an unfenced alternate is available, Control Plane marks the
rebuild `applied` and selects that route generation; if no alternate exists,
it records `pending_degraded_fallback` and keeps backend relay as the
explicit degraded path until a new route appears. The compatibility release
it records `pending_degraded_route_state` and keeps the channel in explicit
degraded route state until a new route appears. The compatibility release
`0.2.175` keeps node/host-agent signed-config models aligned with these new
fields.
- C18U moves rebuild metadata into node-agent runtime behavior. Node-agent
@@ -437,10 +439,10 @@ The first backend contract slice is implemented:
- C18V adds route-manager transition telemetry and churn coverage. Node-agent
`0.2.177` reports `route_manager_transition` alongside the current manager
snapshot, including previous/current generation, status, decision count,
withdrawn route count, restored route count, pending-degraded fallback count,
withdrawn route count, restored route count, pending degraded route-state count,
rebuild applied count, and any cached selected route cleared because Control
Plane withdrew it. Coverage verifies three service-neutral lifecycle cases:
applied rebuild replacement, pending degraded fallback when no alternate is
applied rebuild replacement, pending degraded route state when no alternate is
available, and rollback/restoration when a fresh config removes the rebuild
decision.
- C18W adds a live docker-test verification loop for that telemetry. The smoke
@@ -973,8 +975,8 @@ The first backend contract slice is implemented:
in C18Z45; rebuild snapshot maintenance health with overdue/runtime-evidence
visibility landed in C18Z46; node-agent signed service-channel lease
enforcement when cluster authority is pinned landed in C18Z47; backend
introspection fallback for unsigned compatibility clients landed in C18Z48;
accepted-by telemetry for signed/introspection/legacy ingress landed in
introspection fallback for token-authorized compatibility clients landed in C18Z48;
accepted-by telemetry for signed/introspection/token-authorized ingress landed in
C18Z49; durable lease introspection across backend restarts landed in C18Z50;
bounded durable lease cleanup and admin visibility landed in C18Z51; durable
accepted-by access telemetry aggregation with heartbeat fallback and admin
@@ -983,9 +985,9 @@ The first backend contract slice is implemented:
visibility landed in C18Z53; C18Z54 smoke proves the same diagnostics on a
normal non-fallback primary route with healthy rolling route-quality feedback;
C18Z55 smoke proves degraded/fenced normal-route feedback is shown separately
from explicit backend fallback; C18Z56 adds active-channel remediation
from explicit degraded compatibility requests; C18Z56 adds active-channel remediation
diagnostics (`none`, `rebuild_route`, `prefer_alternate_route`,
`use_backend_fallback`) to make the next runtime action explicit, and its
`hold_degraded_route_state`) to make the next runtime action explicit, and its
alternate-route branch is live-smoke-proven with backend fallback kept off.
C18Z57 adds the bounded machine-readable `remediation_command` contract to
active access telemetry rows so route-manager can consume a short-lived
@@ -1058,7 +1060,7 @@ The first backend contract slice is implemented:
`rebuild_request_recorded` or `rebuild_request_rejected` for the active
channel. C18Z76 adds node-side acknowledgement for the allowed
`rebuild_route` branch: node-agent consumes the command as a route-manager
`pending_degraded_fallback` decision with source
`pending_degraded_route_state` decision with source
`service_channel_remediation_command`, while guarded commands remain ignored.
Backend access telemetry correlates that heartbeat evidence with the durable
ledger and reports `rebuild_request_recorded_node_pending`. C18Z77 resolves
@@ -1089,7 +1091,7 @@ The first backend contract slice is implemented:
reselecting the degraded replacement or adding fallback/failure/drop deltas.
C18Z82 proves the no-safe-recovery branch: if that replacement is also fenced
and no safe recovery route exists, synthetic config reports
`service_channel_feedback_no_alternate` / `pending_degraded_fallback` with
`service_channel_feedback_no_alternate` / `pending_degraded_route_state` with
`no_unfenced_alternate_route` instead of silently keeping a bad route.
C18Z83 projects that route-manager decision into active access telemetry and
web-admin active-channel diagnostics, including decision source, route id,
@@ -1124,7 +1126,8 @@ The first backend contract slice is implemented:
`data_plane` is present in the lease, authority payload, introspection
response, and lease-maintenance/admin list. It declares backend API as
control-plane transport, fabric service channel/fabric route as working
data/steady-state transport, backend relay as degraded fallback only, and
data/steady-state transport, degraded compatibility relay as an explicit
compatibility state only, and
service-neutral protocol-agnostic isolated logical flows as the runtime
contract for VPN, Remote Workspace, files, video, and future services. C18Z91
makes node-agent consume the signed/introspected data-plane contract, apply
@@ -1187,12 +1190,13 @@ channel class, selected entry node, allowed flow isolation, and data-plane
contract on `remote-workspaces/{resource_id}/streams/{channel_class}`. Empty
probe requests return `202` with a remote-workspace ingress probe contract and
access telemetry; real RDP frame forwarding remains deliberately
`not_implemented` until the service adapter work begins.
`validated_only` for empty probes until the service adapter work begins.
C19E adds a narrow frame-batch probe on that boundary. The adapter contract
advertises `rap.remote_workspace_frame_batch.v1`, and entry-node accepts
non-empty payloads only when they are JSON probe batches with `probe_only=true`,
valid remote-workspace logical channels, valid directions, and bounded payload
metadata. Accepted probes return `payload_flow=validated_probe_only`; production
metadata. Accepted frame probes return `payload_flow=validated_probe_only`, while
empty/control probes return `payload_flow=validated_only`; production
frame forwarding is still not enabled.
C19F connects that validated probe to a node-agent local adapter sink. The
in-memory `node_agent_rdp_worker_contract_probe` sink accepts only validated