4.8 KiB
4.8 KiB
Distributed Authority Audit 2026-05-16
Status: target architecture is distributed, but the live test cluster still has bootstrap central authority pieces that must be removed before production trust.
Fixed Requirements
- No single management/API/storage/update service is allowed to own cluster truth.
- Control, storage, update, route authority, observer, and update-cache are node roles in the fabric.
- A service endpoint can serve signed state, but cannot create trusted state by itself.
- Node identity is cryptographic. IP addresses, DNS names, and NAT addresses are endpoint candidates only.
- Nodes must publish real signed candidates for reachable interfaces, STUN/ICE-reflexive addresses, passive reverse channels, and relay fallback.
- Nodes must verify signed control data locally before applying it.
Live Cluster Findings
- The live cluster has one active
cluster_authoritiesrow:rap-ca-ed25519-09877466aa9b6b58b0f312b0b313ea33. - Its metadata says
storage=database_signerandproduction_target=external_cluster_signer_or_hsm. - Release metadata for recent node-agent versions is signed, but signed by the same database-backed authority.
- Synthetic mesh configs are signed and node-agent verifies them against the pinned cluster authority.
- Node enrollment pins cluster authority into
identity.json. - Before this audit, host-agent update plans were carried with signatures but host-agent did not locally reject unsigned plans when a pinned authority was present.
Changes Made In This Audit
- The fabric docs now declare distributed authority and quorum as mandatory.
- Node/fabric endpoints must be explicit
host:port; DNS-only service names are rejected as fabric endpoints. home-1no longer advertisessmoke.cin.suas a fabric endpoint. It now advertises its real interface candidatequic://192.168.200.85:18080.- Host-agent now verifies
node_update_planauthority signatures whenidentity.jsoncontains a pinned cluster authority public key. - Unsigned update plans are rejected in that pinned-authority mode.
- Added
rap.cluster_authority.quorum.v1andrap.cluster_authority.quorum_envelope.v1contracts to both agent and backend authority packages. - Host-agent can now verify quorum-signed update plans when
identity.jsoncontains a pinned quorum descriptor. - Backend update plans now include an
authority_quorumenvelope when the cluster authority metadata contains a quorum descriptor. If that configured quorum cannot be satisfied, the update plan is not issued. - Node bootstrap now carries
cluster_authority_quorum; the approval authority payload signs the quorum descriptor hash, and node-agent persists the descriptor intoidentity.jsonafter verifying the signed hash. - Published
rap-node-agentandrap-host-agentrelease0.2.284-quorumauthority. - Canaried
home-1torap-node-agent 0.2.284-quorumauthorityandrap-host-agent 0.2.284-quorumauthority; both reported healthy/noop after update. - Published
rap-node-agentandrap-host-agentrelease0.2.285-quorumbootstrap. - Canaried
home-1torap-node-agent 0.2.285-quorumbootstrapandrap-host-agent 0.2.285-quorumbootstrap; both reported current=target/noop.ifcm-rufms-s-mo1crwas intentionally not updated because it is behind NAT and still needs fabric/update-cache artifact reachability before further rollout.
Remaining Production Blockers
- Replace
database_signerwith quorum authority: M-of-N signatures from nodes or hardware/offline keys withcontrol-authority/update-authorityroles. - Store authority descriptors and role certificates as replicated signed state, not only database rows.
- Require quorum envelopes for the remaining high-risk mutations: role mutation, release creation, update policy mutation, route lease issuance, relay/rendezvous lease issuance, storage placement, and authority rotation. Node update plans and bootstrap quorum pinning now have the first contract hooks, but production still needs real M-of-N signers.
- Add node-side verification of release metadata in addition to update-plan verification; update-plan verification is now enforced by host-agent when a pinned authority or pinned quorum descriptor exists.
- Add update-cache mirror selection through fabric endpoint candidates instead of a single HTTP origin.
- Add signed endpoint-candidate epochs so peer directory gossip can survive API replica loss.
- Add revocation/fencing epochs for compromised authority keys, nodes, and update artifacts.
Acceptance Rule
The cluster is not production-trust-ready while a single database_signer can
create authoritative cluster mutations. It may remain as a development bootstrap
signer only when every signed payload clearly identifies it as bootstrap and
nodes can be configured to reject it in production mode.