Files
rdp-proxy/docs/architecture/DISTRIBUTED_AUTHORITY_AUDIT_2026-05-16.md
T

4.8 KiB

Distributed Authority Audit 2026-05-16

Status: target architecture is distributed, but the live test cluster still has bootstrap central authority pieces that must be removed before production trust.

Fixed Requirements

  • No single management/API/storage/update service is allowed to own cluster truth.
  • Control, storage, update, route authority, observer, and update-cache are node roles in the fabric.
  • A service endpoint can serve signed state, but cannot create trusted state by itself.
  • Node identity is cryptographic. IP addresses, DNS names, and NAT addresses are endpoint candidates only.
  • Nodes must publish real signed candidates for reachable interfaces, STUN/ICE-reflexive addresses, passive reverse channels, and relay fallback.
  • Nodes must verify signed control data locally before applying it.

Live Cluster Findings

  • The live cluster has one active cluster_authorities row: rap-ca-ed25519-09877466aa9b6b58b0f312b0b313ea33.
  • Its metadata says storage=database_signer and production_target=external_cluster_signer_or_hsm.
  • Release metadata for recent node-agent versions is signed, but signed by the same database-backed authority.
  • Synthetic mesh configs are signed and node-agent verifies them against the pinned cluster authority.
  • Node enrollment pins cluster authority into identity.json.
  • Before this audit, host-agent update plans were carried with signatures but host-agent did not locally reject unsigned plans when a pinned authority was present.

Changes Made In This Audit

  • The fabric docs now declare distributed authority and quorum as mandatory.
  • Node/fabric endpoints must be explicit host:port; DNS-only service names are rejected as fabric endpoints.
  • home-1 no longer advertises smoke.cin.su as a fabric endpoint. It now advertises its real interface candidate quic://192.168.200.85:18080.
  • Host-agent now verifies node_update_plan authority signatures when identity.json contains a pinned cluster authority public key.
  • Unsigned update plans are rejected in that pinned-authority mode.
  • Added rap.cluster_authority.quorum.v1 and rap.cluster_authority.quorum_envelope.v1 contracts to both agent and backend authority packages.
  • Host-agent can now verify quorum-signed update plans when identity.json contains a pinned quorum descriptor.
  • Backend update plans now include an authority_quorum envelope when the cluster authority metadata contains a quorum descriptor. If that configured quorum cannot be satisfied, the update plan is not issued.
  • Node bootstrap now carries cluster_authority_quorum; the approval authority payload signs the quorum descriptor hash, and node-agent persists the descriptor into identity.json after verifying the signed hash.
  • Published rap-node-agent and rap-host-agent release 0.2.284-quorumauthority.
  • Canaried home-1 to rap-node-agent 0.2.284-quorumauthority and rap-host-agent 0.2.284-quorumauthority; both reported healthy/noop after update.
  • Published rap-node-agent and rap-host-agent release 0.2.285-quorumbootstrap.
  • Canaried home-1 to rap-node-agent 0.2.285-quorumbootstrap and rap-host-agent 0.2.285-quorumbootstrap; both reported current=target/noop. ifcm-rufms-s-mo1cr was intentionally not updated because it is behind NAT and still needs fabric/update-cache artifact reachability before further rollout.

Remaining Production Blockers

  • Replace database_signer with quorum authority: M-of-N signatures from nodes or hardware/offline keys with control-authority / update-authority roles.
  • Store authority descriptors and role certificates as replicated signed state, not only database rows.
  • Require quorum envelopes for the remaining high-risk mutations: role mutation, release creation, update policy mutation, route lease issuance, relay/rendezvous lease issuance, storage placement, and authority rotation. Node update plans and bootstrap quorum pinning now have the first contract hooks, but production still needs real M-of-N signers.
  • Add node-side verification of release metadata in addition to update-plan verification; update-plan verification is now enforced by host-agent when a pinned authority or pinned quorum descriptor exists.
  • Add update-cache mirror selection through fabric endpoint candidates instead of a single HTTP origin.
  • Add signed endpoint-candidate epochs so peer directory gossip can survive API replica loss.
  • Add revocation/fencing epochs for compromised authority keys, nodes, and update artifacts.

Acceptance Rule

The cluster is not production-trust-ready while a single database_signer can create authoritative cluster mutations. It may remain as a development bootstrap signer only when every signed payload clearly identifies it as bootstrap and nodes can be configured to reject it in production mode.