97 lines
4.8 KiB
Markdown
97 lines
4.8 KiB
Markdown
# Distributed Authority Audit 2026-05-16
|
|
|
|
Status: target architecture is distributed, but the live test cluster still has
|
|
bootstrap central authority pieces that must be removed before production trust.
|
|
|
|
## Fixed Requirements
|
|
|
|
- No single management/API/storage/update service is allowed to own cluster
|
|
truth.
|
|
- Control, storage, update, route authority, observer, and update-cache are node
|
|
roles in the fabric.
|
|
- A service endpoint can serve signed state, but cannot create trusted state by
|
|
itself.
|
|
- Node identity is cryptographic. IP addresses, DNS names, and NAT addresses are
|
|
endpoint candidates only.
|
|
- Nodes must publish real signed candidates for reachable interfaces,
|
|
STUN/ICE-reflexive addresses, passive reverse channels, and relay fallback.
|
|
- Nodes must verify signed control data locally before applying it.
|
|
|
|
## Live Cluster Findings
|
|
|
|
- The live cluster has one active `cluster_authorities` row:
|
|
`rap-ca-ed25519-09877466aa9b6b58b0f312b0b313ea33`.
|
|
- Its metadata says `storage=database_signer` and
|
|
`production_target=external_cluster_signer_or_hsm`.
|
|
- Release metadata for recent node-agent versions is signed, but signed by the
|
|
same database-backed authority.
|
|
- Synthetic mesh configs are signed and node-agent verifies them against the
|
|
pinned cluster authority.
|
|
- Node enrollment pins cluster authority into `identity.json`.
|
|
- Before this audit, host-agent update plans were carried with signatures but
|
|
host-agent did not locally reject unsigned plans when a pinned authority was
|
|
present.
|
|
|
|
## Changes Made In This Audit
|
|
|
|
- The fabric docs now declare distributed authority and quorum as mandatory.
|
|
- Node/fabric endpoints must be explicit `host:port`; DNS-only service names are
|
|
rejected as fabric endpoints.
|
|
- `home-1` no longer advertises `smoke.cin.su` as a fabric endpoint. It now
|
|
advertises its real interface candidate `quic://192.168.200.85:18080`.
|
|
- Host-agent now verifies `node_update_plan` authority signatures when
|
|
`identity.json` contains a pinned cluster authority public key.
|
|
- Unsigned update plans are rejected in that pinned-authority mode.
|
|
- Added `rap.cluster_authority.quorum.v1` and
|
|
`rap.cluster_authority.quorum_envelope.v1` contracts to both agent and
|
|
backend authority packages.
|
|
- Host-agent can now verify quorum-signed update plans when `identity.json`
|
|
contains a pinned quorum descriptor.
|
|
- Backend update plans now include an `authority_quorum` envelope when the
|
|
cluster authority metadata contains a quorum descriptor. If that configured
|
|
quorum cannot be satisfied, the update plan is not issued.
|
|
- Node bootstrap now carries `cluster_authority_quorum`; the approval authority
|
|
payload signs the quorum descriptor hash, and node-agent persists the
|
|
descriptor into `identity.json` after verifying the signed hash.
|
|
- Published `rap-node-agent` and `rap-host-agent` release
|
|
`0.2.284-quorumauthority`.
|
|
- Canaried `home-1` to `rap-node-agent 0.2.284-quorumauthority` and
|
|
`rap-host-agent 0.2.284-quorumauthority`; both reported healthy/noop after
|
|
update.
|
|
- Published `rap-node-agent` and `rap-host-agent` release
|
|
`0.2.285-quorumbootstrap`.
|
|
- Canaried `home-1` to `rap-node-agent 0.2.285-quorumbootstrap` and
|
|
`rap-host-agent 0.2.285-quorumbootstrap`; both reported current=target/noop.
|
|
`ifcm-rufms-s-mo1cr` was intentionally not updated because it is behind NAT
|
|
and still needs fabric/update-cache artifact reachability before further
|
|
rollout.
|
|
|
|
## Remaining Production Blockers
|
|
|
|
- Replace `database_signer` with quorum authority:
|
|
M-of-N signatures from nodes or hardware/offline keys with
|
|
`control-authority` / `update-authority` roles.
|
|
- Store authority descriptors and role certificates as replicated signed state,
|
|
not only database rows.
|
|
- Require quorum envelopes for the remaining high-risk mutations: role
|
|
mutation, release creation, update policy mutation, route lease issuance,
|
|
relay/rendezvous lease issuance, storage placement, and authority rotation.
|
|
Node update plans and bootstrap quorum pinning now have the first contract
|
|
hooks, but production still needs real M-of-N signers.
|
|
- Add node-side verification of release metadata in addition to update-plan
|
|
verification; update-plan verification is now enforced by host-agent when a
|
|
pinned authority or pinned quorum descriptor exists.
|
|
- Add update-cache mirror selection through fabric endpoint candidates instead
|
|
of a single HTTP origin.
|
|
- Add signed endpoint-candidate epochs so peer directory gossip can survive API
|
|
replica loss.
|
|
- Add revocation/fencing epochs for compromised authority keys, nodes, and
|
|
update artifacts.
|
|
|
|
## Acceptance Rule
|
|
|
|
The cluster is not production-trust-ready while a single `database_signer` can
|
|
create authoritative cluster mutations. It may remain as a development bootstrap
|
|
signer only when every signed payload clearly identifies it as bootstrap and
|
|
nodes can be configured to reject it in production mode.
|