rap-node-agent
Native node agent MVP for the Secure Access Fabric.
Status: Stage C17Z18 synthetic route-health effective path boundary.
This agent is intentionally native. Containers may package service workloads,
but the host-level node identity belongs to rap-node-agent.
Current Scope
Implemented:
- config loading from flags/environment
- local identity state file
- enrollment request client
- heartbeat client
- capability/facts payload
- status-only service reporting payload
- mesh control-channel skeleton
- route-health message skeleton
- relay skeleton that refuses production payload forwarding
- disabled-by-default synthetic mesh runtime for
fabric.probe/fabric.probe_ack - direct and single-relay synthetic route tests
- synthetic
fabric.route_health/fabric.route_health_ack - local route success/failure observations
- fallback route selection for test topology
- route cache invalidation on version changes
- synthetic relay envelope validation
- per-channel bounded queues for synthetic traffic
- QoS dequeue order:
fabric_control, thenroute_control, thentelemetry - telemetry-only stale message drop under backpressure
- reliable fabric/control queue rejection when full
- bounded non-production
synthetic.echotest-service path - direct, single-relay, and forced-fallback test-service proofs
- live QUIC peer transport for synthetic mesh envelopes
- disabled-by-default synthetic mesh QUIC endpoint in
rap-node-agent mesh-live-smokeharness proving direct and single-relay synthetic traffic over real local QUIC endpoints- scoped synthetic mesh config file loading for peer endpoints and routes
- Control Plane synthetic mesh config read fallback when no local scoped config file is set
- synthetic route-health observations reported to the Control Plane when test flags allow synthetic links
- explicit production mesh forwarding gate config; production forwarding still has no runtime implementation and remains unavailable
- route-bound production mesh envelope contract and fail-closed validation on the QUIC production-forward path
- metadata-only production envelope observation hook for valid envelopes, still without forwarding payloads
- bounded metadata-only production envelope observation sink for accepted observations
- disabled-by-default node-agent wiring for the bounded observation sink
- local metrics for the bounded observation sink without exposing observation records
- local node-agent logging for bounded observation sink metrics
- change-driven suppression for unchanged bounded observation sink metrics logs
- explicit local log distinction between production forwarding gate state and production forwarding runtime state
- node-scoped rendezvous lease refresh through Control Plane synthetic config
- stale relay withdrawal/reselection telemetry
- relay replacement contract reporting for stale rendezvous relays
- route/path decision contract reporting for control-plane route generations
- route generation apply/withdraw tracking for control-plane path decisions
- synthetic route-health route config refresh from Control Plane path decisions
- route-health expected/observed effective path drift reporting
- host-agent Docker update plan executor with artifact checksum/size verification, container replacement, health check, status reporting, and rollback attempt
- host-agent update loop for service/timer placement
- host-agent binary self-update loop for the updater service itself
- maximum capacity guard for the local production observation sink
- panic-safe fail-closed production envelope observation wrapper
- explicit
4096byte payload boundary for validated production fabric-control envelopes - explicit future-skew boundary for validated production envelope
created_at - scoped synthetic peer endpoint candidate config with reachability, NAT/connectivity hints, priority, policy tags, and metadata
- deterministic local peer endpoint candidate scoring model for synthetic config candidates
- optional local health observation overlay for endpoint candidate scoring
- gate-controlled production
fabric.controldirect next-hop delivery - route-path-bound production
fabric.controlmulti-hop forwarding - local metadata-only production
fabric.controlforwarding event logs - route-config-bound production
fabric.controlforwarding validation - scoped peer directory and bounded recovery seed config parsing/validation
- node-local peer cache with bounded warm peer health probes
- advertised mesh endpoint reporting through heartbeat metadata
- multiple advertised endpoint candidates, including private/corporate LAN
- peer connection state machine for warm-peer health
- bounded peer recovery planner over peer cache and connection states
- peer connection intent planner with transport readiness classification
- peer connection manager for real control-plane health over reusable QUIC fabric transport
- route-health effective-path runtime through replacement relay control paths
Not implemented yet:
- mesh packet routing
- production mesh service traffic
- VPN runtime
- production workload supervision
- certificate issuance/rotation
- in-agent native updater runtime
- privileged host route/firewall control
Build
cd agents\rap-node-agent
go test ./...
go build -o bin\rap-node-agent.exe .\cmd\rap-node-agent
go build -buildvcs=false -o bin\rap-host-agent.exe .\cmd\rap-host-agent
go build -o bin\mesh-live-smoke.exe .\cmd\mesh-live-smoke
Docker Host Agent Bootstrap
rap-host-agent is the first host-level installer/updater boundary for Docker
placement. It does not join the mesh itself. It applies the cluster's install
intent locally by running the rap-node-agent container with a persistent host
state directory. On Linux it also installs a systemd update-loop service by
default, so nodes continue to update from Control Plane policy without operator
commands on each host.
Preferred fabric-native install:
rap-host-agent install \
--bootstrap-bundle ./docker-node-1.bootstrap.json
Offline/import bootstrap is also supported:
rap-host-agent install \
--bootstrap-bundle ./docker-node-1.bootstrap.json
The bootstrap bundle carries the signed install profile, pinned cluster authority key, and QUIC fabric registry seeds. The host-agent applies Docker image, container, state-dir, mesh listen, advertise, NAT/connectivity, and region settings locally, then the node-agent enrolls through QUIC fabric.
Manual install is still supported:
rap-host-agent install \
--bootstrap-bundle ./docker-node-1.bootstrap.json
The command creates or replaces only the local Docker container. The running
node-agent submits the join request, waits for owner approval, stores its
identity in the mounted state directory, and then sends heartbeats. Re-running
with --replace updates the container while preserving node identity. Pass
--auto-update-enabled=false only for lab/debug installs where the local
systemd updater must not be registered.
Useful checks:
rap-host-agent status --container-name rap-node-agent-docker-node-1
docker logs -f rap-node-agent-docker-node-1
For a node that was installed before the updater existed, register only the local updater service without recreating the node-agent container:
rap-host-agent install-updater \
--state-dir /var/lib/rap/nodes/docker-node-1 \
--container-name rap-node-agent-docker-node-1
Docker Host Agent Updates
rap-host-agent update applies one Control Plane update plan for an already
enrolled Docker node. The host-agent fetches the plan, downloads the selected
Docker image tar, verifies size and sha256, loads the image, recreates the
node-agent container from the existing Docker runtime settings, checks that the
container is running, and reports update phases back to the Control Plane.
rap-host-agent update \
--cluster-id <cluster_id> \
--node-id <node_id> \
--container-name rap-node-agent-docker-node-1 \
--current-version 0.1.0-c17z26
rap-host-agent update-loop is the per-node executor and health boundary. It
does not need to poll for normal releases: the node-agent receives an
rap.node_update_hint.v1 subscription hint from Control Plane or the assigned
update-cache service during heartbeat, writes <state-dir>/update-trigger.json,
and the host-agent wakes immediately. The interval is an emergency fallback for
missed hints, service migration, or a dead update-cache service; keep it long
in production. The loop keeps running after transient errors by default and
advances its in-process current version after a successful update so it does
not repeatedly apply the same plan. When started without --node-id it reads
<state-dir>/identity.json and waits until the approved node identity appears,
which lets the updater service start immediately during first install. It also
persists the last applied node-agent version in
<state-dir>/host-update-state.json so a service restart does not reapply an
already-installed release.
rap-host-agent update-loop \
--cluster-id <cluster_id> \
--node-id <node_id> \
--container-name rap-node-agent-docker-node-1 \
--current-version 0.1.0-c17z26 \
--interval-seconds 21600 \
--jitter 0.15
Update-cache nodes are ordinary cluster nodes with the update-cache role.
Control Plane assigns a healthy update-cache node in the heartbeat hint. If the
assigned service disappears, the next hint returns control_plane_fallback or a
new service assignment; the local updater stays subscribed and only uses the
long fallback timer as a last resort.
rap-host-agent update-host-agent-loop updates the host-agent binary itself.
Only one global systemd unit is installed per Docker host:
rap-host-agent-self-updater.service. It uses one approved local node identity
to ask Control Plane for product rap-host-agent with install type
linux_binary, verifies the downloaded binary size and sha256, atomically
replaces /usr/local/bin/rap-host-agent, and reports status. The already
running process continues until systemd restarts it, while new invocations use
the new binary.
rap-host-agent update-host-agent-loop \
--cluster-id <cluster_id> \
--state-dir /var/lib/rap/nodes/docker-node-1 \
--binary-path /usr/local/bin/rap-host-agent
Windows Host Agent Bootstrap And Updates
Windows uses the same bootstrap bundle model, but the local placement is a
Scheduled Task instead of Docker. In --startup-mode auto the installer first
tries an elevated ONSTART task running as SYSTEM; without admin rights it
falls back to a per-user ONLOGON task. The ONSTART mode starts after reboot
without an interactive user session. The ONLOGON fallback can only start after
that Windows user signs in.
%TEMP%\rap-host-agent.exe install-windows --bootstrap-bundle "C:\bootstrap\office-win-1.bootstrap.json" --startup-mode "auto"
Offline/import bootstrap is also supported:
%TEMP%\rap-host-agent.exe install-windows --bootstrap-bundle "C:\bootstrap\office-win-1.bootstrap.json" --startup-mode "auto"
install-windows installs two tasks:
RAP Node Agent <node>runsrap-node-agent.exe.RAP Host Agent Updater <node>runsrap-host-agent update-loopfor productrap-node-agent, install typewindows_service, and replaces the localrap-node-agent.exefrom signed release artifacts.
During first bootstrap the updater can read <state-dir>\identity.json and
will wait until the join request is approved. For an already-enrolled Windows
node, prefer passing --node-id explicitly. That makes the updater wrapper
independent from the local identity file location and is required for repair of
older Windows installs where the node is already heartbeat-healthy but the
host-agent updater has no usable identity file.
The repair path also reuses the local signed bootstrap/runtime state; it does not require any backend URL.
The admin UI node details page generates a downloadable
rap-repair-updater-<node>.cmd for this repair path. It performs these steps:
- prints
schtasks /Querydiagnostics for the node-agent and updater tasks; - prints the local
rap-*.exe*files; - downloads the current
rap-host-agent.exe; - reinstalls the Windows updater wrapper with
--node-id; - runs a foreground one-shot
update-loop --max-runs 1; - applies
rap-host-agent.exe.nextif the running host-agent could not replace itself; - restarts
RAP Host Agent Updater <node>; - prints post-repair diagnostics.
Expected successful updater reports in the admin panel:
rap-node-agent <target> -> <target> plan/noop
rap-host-agent <target> -> <target> plan/noop
If the latest host-agent report is apply/staged, the new host-agent binary
was downloaded as rap-host-agent.exe.next but the running process still held
the old executable. End and run the updater task once, or rerun the generated
repair command:
schtasks /End /TN "RAP Host Agent Updater office-win-1"
schtasks /Run /TN "RAP Host Agent Updater office-win-1"
Windows Reboot / Autostart Verification
After installation or repair, verify the service survives a reboot:
- Reboot the Windows host, or at minimum restart both scheduled tasks.
- Confirm the tasks exist:
schtasks /Query /TN "RAP Node Agent office-win-1" /V /FO LIST
schtasks /Query /TN "RAP Host Agent Updater office-win-1" /V /FO LIST
- Confirm the admin panel shows:
heartbeat: fresh
rap-node-agent: plan/noop
rap-host-agent: plan/noop
node version_state: current
Without admin rights, install-windows --startup-mode auto may fall back to
user-task. That node can still heartbeat and update after the user logs in,
but it will not start before logon after a reboot. Use an elevated shell for
production Windows nodes that must recover unattended.
Control Plane release artifacts for Windows must use:
product=rap-node-agentos=windowsarch=amd64install_type=windows_servicekind=binary
First Enrollment
Create a join token from the platform control plane, then run:
Use a signed bootstrap bundle plus QUIC fabric registry seeds. The node enrolls only through QUIC fabric inside the farm.
The agent submits a pending join request and exits. It does not self-activate. A platform admin must approve the join request.
Enrollment Approval
When the agent enrolls, it stores the returned pending_join_request_id and
polls the Control Plane bootstrap endpoint until the platform owner approves
the request or the enrollment timeout expires. After approval, the agent
verifies the signed bootstrap contract and writes the approved node_id,
cluster_id, identity_status=active, cluster_authority_public_key, and
cluster_authority_fingerprint into identity.json.
Future C3 hardening can add signed node certificates and automatic secure certificate material exchange.
Then run the agent again:
.\bin\rap-node-agent.exe `
-state-dir C:\ProgramData\RapNodeAgent
It sends periodic heartbeats through the signed control-api service over QUIC
fabric:
fabric control path /clusters/{clusterID}/nodes/{nodeID}/heartbeats
Environment Variables
RAP_CLUSTER_IDRAP_CLUSTER_AUTHORITY_PUBLIC_KEYRAP_CLUSTER_AUTHORITY_FINGERPRINTRAP_JOIN_TOKENRAP_NODE_NAMERAP_NODE_STATE_DIRRAP_WORKLOAD_SUPERVISION_ENABLEDRAP_HEARTBEAT_INTERVAL_SECONDSRAP_ENROLLMENT_POLL_INTERVAL_SECONDSRAP_ENROLLMENT_POLL_TIMEOUT_SECONDSRAP_FABRIC_RUNTIME_ENABLEDRAP_FABRIC_LISTEN_ADDRRAP_MESH_ADVERTISE_ENDPOINTRAP_MESH_ADVERTISE_ENDPOINTS_JSONRAP_MESH_ADVERTISE_TRANSPORTRAP_MESH_CONNECTIVITY_MODERAP_MESH_NAT_TYPERAP_MESH_REGIONRAP_MESH_SYNTHETIC_CONFIGRAP_MESH_PEER_ENDPOINTS_JSONRAP_MESH_SYNTHETIC_ROUTES_JSONRAP_MESH_PRODUCTION_FORWARDING_ENABLEDRAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY
RAP_FABRIC_RUNTIME_ENABLED defaults to false. It gates only the
C17A/C17B/C17C/C17D/C17E synthetic probe, route-health, relay scheduling,
bounded synthetic.echo test-service runtime, and live synthetic QUIC endpoint.
It must not be used for RDP, VPN, file, video, or other production service
traffic.
RAP_WORKLOAD_SUPERVISION_ENABLED defaults to false. When enabled, the agent
polls node-scoped desired workloads and reports status. The current bounded
runtime reports built-in core-mesh and fabric-listener services as running
when enabled, supports the native built-in synthetic.echo test workload, and
keeps unsupported production workloads such as RDP workers degraded until their
supervisors are implemented.
For Remote Workspace/RDP integration work, the native rdp-worker desired
workload supports only an explicit adapter_contract_probe mode. That mode
reports the remote-workspace adapter channel contract and requires Fabric
Service Channel as the future data plane; it does not start FreeRDP, create a
remote session, or carry production RDP payloads.
RAP_FABRIC_LISTEN_ADDR names the historical synthetic listener address, but the
current runtime is QUIC-fabric-only and does not start an HTTP listener.
RAP_MESH_SYNTHETIC_CONFIG points to
a scoped synthetic mesh config snapshot and is preferred over debug JSON.
RAP_MESH_PEER_ENDPOINTS_JSON is a JSON object mapping peer node IDs to
endpoint URLs. RAP_MESH_SYNTHETIC_ROUTES_JSON is a JSON array of synthetic
route objects. If no local scoped config file is set, the agent asks the
Control Plane for:
/clusters/{clusterID}/nodes/{nodeID}/mesh/synthetic-config
The JSON variables are debug fallback only.
Control Plane synthetic config with authority_required=true must include a
signed authority_payload / authority_signature envelope and a
cluster_authority descriptor. The agent verifies the signature, validates the
config hash, and rejects mismatched pinned authority values when
RAP_CLUSTER_AUTHORITY_PUBLIC_KEY, RAP_CLUSTER_AUTHORITY_FINGERPRINT, or the
same fields in identity.json are set.
RAP_MESH_PRODUCTION_FORWARDING_ENABLED defaults to false. It is a future
production-forwarding gate only. Turning it on does not enable production mesh
payload forwarding; the runtime still refuses service traffic after validating
the route-bound production envelope contract, until a later approved
production mesh stage implements route-bound, policy-bound forwarding.
The production envelope contract requires route, hop, TTL, expiry, payload
length, and SHA-256 payload hash fields. C17J accepts only the
fabric_control channel class and fabric.control message type for
validation. RDP, VPN, render, file, video, and service workload channels are
rejected.
C17K adds a local metadata-only observation hook after successful production envelope validation. Observations include route/message/hop/channel metadata and payload length/hash, not the payload body. Observation failure fails closed, and the endpoint still does not forward payloads.
C17L adds a bounded in-memory observation sink for accepted metadata-only observations. The sink drops the oldest observation when full and still stores no payload bodies.
RAP_MESH_PRODUCTION_OBSERVATION_SINK_CAPACITY defaults to 0. When set above
zero, C17M wires the bounded metadata-only sink into the node-agent mesh server.
This remains local-only, exposes no read API, stores no payload bodies, and
does not enable production forwarding. C17R rejects values above 10000.
C17N adds local sink metrics: configured capacity, current depth, accepted total, and dropped-oldest total. Metrics do not expose observation records, route IDs, message IDs, hashes, payload metadata, or payload bodies.
C17O logs those aggregate metrics locally from the node-agent loop when the sink is explicitly enabled. This does not add a read API or Control Plane reporting.
C17P logs aggregate sink metrics only when they change, so steady heartbeat loops do not repeat identical local metrics lines.
C17Q logs production_forwarding_gate_enabled separately from
production_forwarding_runtime_enabled. The runtime field remains false;
turning on the gate still does not enable production forwarding.
C17S makes production envelope observation panic-safe. Observer errors and observer panics both fail closed as observation failure; forwarding remains unavailable.
C17T limits validated production fabric.control envelope payloads to 4096
bytes. Oversized envelopes are rejected before observation.
C17U rejects production fabric.control envelopes whose created_at is more
than one minute in the future.
C17V adds scoped peer endpoint candidates to synthetic mesh config. Candidate entries describe possible per-node endpoints with transport, address, reachability, NAT type, connectivity mode, priority, policy tags, verification time, and metadata. They are model/config hints only; no production route scoring, NAT traversal, shortcut routing, or forwarding runtime is implemented.
C17W adds deterministic local scoring for scoped endpoint candidates. Scoring uses transport, reachability, connectivity mode, NAT type, priority, preferred region, policy tags, channel class, and verification age. It returns ranked candidates and reason labels only; it does not select production routes, open connections, perform NAT traversal, or forward payloads.
C17X extends candidate scoring with optional local health observations keyed by
endpoint_id. Observations can contribute latency, success/failure history,
recent failure reason, reliability score, and freshness/staleness signals.
The score remains advisory only and is not wired into production forwarding.
C17Z adds the first narrow production forwarding runtime. When
RAP_MESH_PRODUCTION_FORWARDING_ENABLED=true, the QUIC production-forward
handler can deliver route-bound fabric.control envelopes at the local
destination or forward them to a direct next hop from explicit peer endpoint
config. Service channels, RDP/VPN/file/video payloads, arbitrary relay
forwarding, and multi-hop production route execution remain unavailable.
C17Z1 adds route-path-bound multi-hop forwarding for production
fabric.control only. Envelopes may carry route_path and
visited_node_ids; each relay validates its path position, forwards only to
the next route-path node, updates TTL/hop/visited metadata, and rejects loops.
Service payloads remain unavailable.
C17Z2 emits local mesh_production_forward_event logs for production
fabric.control forwarding outcomes: accepted, forwarded, delivered, and
rejected. Logs include route/message/hop/channel/status/reason/TTL/hop count/
route path length/visited count/payload length metadata only. Payload bodies
are not logged, no observation read API is added, and service payloads remain
unavailable.
C17Z3 binds production fabric.control forwarding to loaded scoped or
Control Plane route config when routes are available locally. Configured
envelopes must match route_id, cluster, source, destination, route path,
next hop, allowed channel, expiry, max TTL, and max hop count before
forwarding. If no route config is present, existing C17Z1 behavior is
preserved. Service payloads remain unavailable.
C17Z4 adds scoped peer directory and recovery seed config. peer_directory
describes only peers needed by the node-scoped mesh config. recovery_seeds
is an explicit, bounded bootstrap/recovery list and is not a full cluster node
list. The node-agent parses and validates these fields, but does not yet
implement a persistent connection manager, NAT traversal, or
relay/rendezvous runtime.
C17Z5 turns scoped peer directory and recovery seed config into node-local
runtime PeerCache state. The cache builds a bounded warm peer set from
route-adjacent peers, recovery seeds, peer endpoints, and endpoint candidates.
When synthetic mesh testing is enabled, the node-agent probes warm peers with
QUIC fabric live probes and reports metadata-only mesh-link observations. This is not
a persistent connection manager and does not forward service payloads.
C17Z6 adds advertised mesh endpoint reporting. When
RAP_MESH_ADVERTISE_ENDPOINT is set, node-agent includes a
mesh_endpoint_report in heartbeat metadata with transport, connectivity mode,
NAT hint, region, observed time, and endpoint candidate metadata. Control Plane
can project the latest reported endpoint into node-scoped synthetic mesh config
for route-path peers. This does not perform automatic public IP discovery,
STUN/TURN/ICE NAT classification, or service payload forwarding.
C17Z7 adds RAP_MESH_ADVERTISE_ENDPOINTS_JSON for multiple advertised
endpoints per node. Candidates can describe public, private, corporate/LAN,
outbound, or relay-style addresses. Endpoint scoring rewards private-lan,
corp-lan, and same-site policy tags, and peer cache can use the best
candidate address for warm-peer health probes. This supports corporate-network
cluster segments without enabling service payload forwarding.
C17Z8 adds a node-local peer connection state machine on top of warm-peer
health probes. Warm peers move through disconnected, connecting, ready,
degraded, and backoff; repeated probe failures enter bounded backoff, and
successful probes recover to ready. Mesh-link observations include
metadata-only connection state. This is not a persistent socket/session manager
and does not forward service payloads.
C17Z9 adds a node-local peer recovery planner. The node targets a bounded
stable ready-peer set, defaulting to three connectable peers when available,
instead of probing every known cluster node. When ready peers fall below target,
the planner selects bounded recovery probes from warm peers, recovery seeds,
and other connectable scoped peers, skipping active backoff entries. Heartbeats
include metadata-only mesh_peer_recovery_report state. This is not persistent
connection transport, NAT traversal, relay/rendezvous runtime, or service
payload forwarding.
C17Z10 adds a node-local peer connection intent planner over the C17Z9 recovery
plan. It classifies bounded peer work as maintain, probe, or recover,
and classifies transport readiness as direct, private_lan,
corporate_lan, outbound_only, or relay_required. Heartbeats include
metadata-only mesh_peer_connection_intent_report counts. This is not
persistent connection transport, STUN/TURN/ICE, NAT traversal, relay runtime,
or service payload forwarding.
C17Z11 adds the first real node-local peer connection manager for mesh
control-plane health. It uses a reusable QUIC fabric transport to probe
direct/private/corporate peer endpoints selected by C17Z10 intents, updates
the shared peer connection tracker, and records waiting_rendezvous for
outbound-only or relay-required peers. Heartbeats include metadata-only
mesh_peer_connection_manager_report state. This is not STUN/TURN/ICE,
relay/rendezvous runtime, route lease generation, VPN runtime, or service
payload forwarding.
C17Z12 adds a node-scoped rendezvous/relay control-plane lease contract for
peers that would otherwise remain waiting_rendezvous. The agent consumes
rendezvous_leases, resolves matching intents into relay_quic, probes the
relay node over QUIC fabric live probe, and records relay_ready for the peer control
path. This remains control-plane health only and does not enable RDP/VPN/file/
video/service payload forwarding, arbitrary relay packet forwarding,
STUN/TURN/ICE, or host networking changes.
C17Z13 adds heartbeat telemetry for rendezvous lease admission and renewal
posture. The agent emits mesh_rendezvous_lease_report with local role,
relay/peer admission counts, TTL, renewal-after time, renewal-needed status,
relay_ready, and explicit no-payload boundary flags. This remains
metadata-only control-plane telemetry and does not enable service payload
forwarding.
C17Z14 adds a control-plane refresh contract for rendezvous leases. When a lease is renewal-needed, expired, invalid, or tied to a stale relay state, the agent reloads node-scoped synthetic config from Control Plane, updates the running peer cache/route/lease state, and reports refresh counters plus stale relay withdrawal/reselection fields. This remains control-plane health only and does not enable service payload forwarding.
C17Z15 adds the node side of backend relay replacement policy. The agent
advertises the relay replacement contract capability and emits
c17z15.mesh_rendezvous_lease_report.v1; stale relay state is matched to the
exact rendezvous lease/relay when that metadata is present, so an alternate
replacement lease for the same peer is not treated as stale by association.
This remains control-plane health only and does not enable service payload
forwarding.
C17Z16 adds route/path decision reporting. The agent consumes
route_path_decisions from Control Plane synthetic config, keeps the latest
control-plane generation in local state, and emits
c17z18.mesh_route_path_decision_report.v1 with effective hops, previous/next
hop, selected replacement relay, generation, and no-payload boundary flags.
This remains metadata-only route planning and does not enable service payload
forwarding.
C17Z17 adds node-side route generation tracking for Control Plane
route_path_decisions. The agent emits
c17z18.mesh_route_generation_report.v1 with active, applied, unchanged, and
withdrawn decision counts, total counters, generation change state, active
decision details, and withdrawn decision details. When the first observed
config already contains a stale relay replacement, the tracker emits a
withdrawn_by_replacement record for the old relay path. This remains
metadata-only route planning and does not enable service payload forwarding.
C17Z18 applies Control Plane route_path_decisions to synthetic route-health
route config only. The agent keeps base routes separate from route-health
routes, periodically refreshes scoped config, emits
c17z18.mesh_route_health_config_report.v1, and reports route-health
observations with expected/observed hops and drift status. This probes
replacement relay effective paths for control-plane health only and does not
enable service payload forwarding.
C17Z21 defines the portable inbound listener contract for Docker, Linux
service, Windows service, and future OS-specific node packages. The node-agent
does not stop when the mesh listen port cannot be bound. It keeps the outbound
Control Plane session alive and emits c17z21.fabric_listener_report.v1 in
heartbeat metadata with configured address, effective address, listen mode,
listener status, inbound reachability, one-way connectivity, failure reason,
and port-conflict diagnostics.
RAP_FABRIC_LISTEN_PORT_MODE controls behavior:
manual: bind exactlyRAP_FABRIC_LISTEN_ADDR; on conflict reportlisten_failedand wait for an operator/config change.auto: tryRAP_FABRIC_LISTEN_ADDR; on conflict scanRAP_FABRIC_LISTEN_AUTO_PORT_START..RAP_FABRIC_LISTEN_AUTO_PORT_ENDand reportauto_reboundwhen a free port is selected.disabled: do not open an inbound listener; the node is expected to be outbound-only, relay/rendezvous, or Control Plane only.
For RAP_MESH_CONNECTIVITY_MODE=outbound_only, inbound listener failure is not
treated as node death. The heartbeat remains healthy with
mesh_one_way_connectivity=true and listener diagnostics. For direct/private
LAN modes, a listener failure degrades the node so the admin panel can show
that the node is alive but cannot accept inbound mesh traffic. Service payload
forwarding is still not enabled by this contract.
C17Z22 separates outbound Control Plane presence from inbound mesh
reachability. When synthetic mesh testing is enabled, every heartbeat includes
c17z22.mesh_outbound_session_report.v1 with node-to-control-plane direction,
keepalive transport, listener conflict state, rendezvous/relay counters, and a
fabric_control_endpoint plus a flag showing whether the current outbound session can be used as a reverse
control-channel contract. This is the portable basis for Docker, Linux service,
Windows service, and future packages where a node may be behind NAT or have no
stable inbound address. It is still control-plane telemetry only and does not
carry RDP/VPN/service payload traffic.
C17Z24 separates the listener bind address from advertised mesh endpoints. The
agent never advertises loopback addresses discovered from the local listener;
127.0.0.1/::1 are test-only bind details, not cluster reachability data.
When the listener is active, the agent enumerates active non-loopback host
interfaces and reports usable endpoint candidates with interface metadata,
address family, reachability, NAT/connectivity hints, and priority. Container
bridge/veth interfaces and link-local addresses are filtered by default, while
physical and VPN-style interfaces are kept so different cluster segments can
choose the address that matches their network. Operator-provided
RAP_MESH_ADVERTISE_ENDPOINT or endpoint-candidate JSON remains authoritative
and is ranked ahead of auto-discovered addresses.
C17Z25 adds per-peer endpoint fallback probing to the control-plane mesh
manager. A node no longer treats the top-ranked endpoint candidate as the only
possible address for a peer. For each warm direct/private/corporate peer, the
manager probes the ranked candidate list until one QUIC fabric endpoint
responds or all direct candidates fail. Heartbeat metadata includes
c17z25.mesh_peer_connection_manager_report.v1 with probe_results,
selected_candidate_id, selected_endpoint, and per-candidate success/failure
details. This is still control-plane health and address selection telemetry; it
does not forward RDP/VPN/service payloads.
Scoped synthetic config shape:
{
"schema_version": "c17z18.synthetic.v1",
"cluster_id": "cluster-1",
"local_node_id": "node-a",
"config_version": "config-v1",
"peer_directory_version": "peers-v1",
"policy_version": "policy-v1",
"peer_endpoints": {
"node-b": "quic://127.0.0.1:19002"
},
"peer_endpoint_candidates": {
"node-b": [
{
"endpoint_id": "node-b-public",
"node_id": "node-b",
"transport": "direct_quic",
"address": "203.0.113.20:443",
"reachability": "public",
"nat_type": "restricted",
"connectivity_mode": "direct",
"priority": 10
}
]
},
"routes": [],
"route_path_decisions": {
"schema_version": "c17z18.route_path_decisions.v1",
"decisions": []
}
}
C17E Live Synthetic Smoke
Run:
cd agents\rap-node-agent
go run .\cmd\mesh-live-smoke
Expected:
- scoped synthetic config loads
- direct
node-a -> node-bsynthetic probe succeeds - relay
node-a -> node-r -> node-bsynthetic probe succeeds - bounded
synthetic.echotest-service succeeds production_forwarding=false
Safety Rules
- The agent never assigns roles to itself.
- The agent reports capabilities only.
- Platform policy assigns roles.
- No RDP/VPN/production service traffic is carried by the C17A-C17Z22 staged mesh runtime.
- Production forwarding remains disabled by default and limited to
fabric.controlwhen explicitly enabled. - No privileged operations are performed by the current agent.