Files
rdp-proxy/docs/architecture/DISTRIBUTED_AUTHORITY_AUDIT_2026-05-16.md
T

97 lines
4.8 KiB
Markdown

# Distributed Authority Audit 2026-05-16
Status: target architecture is distributed, but the live test cluster still has
bootstrap central authority pieces that must be removed before production trust.
## Fixed Requirements
- No single management/API/storage/update service is allowed to own cluster
truth.
- Control, storage, update, route authority, observer, and update-cache are node
roles in the fabric.
- A service endpoint can serve signed state, but cannot create trusted state by
itself.
- Node identity is cryptographic. IP addresses, DNS names, and NAT addresses are
endpoint candidates only.
- Nodes must publish real signed candidates for reachable interfaces,
STUN/ICE-reflexive addresses, passive reverse channels, and relay fallback.
- Nodes must verify signed control data locally before applying it.
## Live Cluster Findings
- The live cluster has one active `cluster_authorities` row:
`rap-ca-ed25519-09877466aa9b6b58b0f312b0b313ea33`.
- Its metadata says `storage=database_signer` and
`production_target=external_cluster_signer_or_hsm`.
- Release metadata for recent node-agent versions is signed, but signed by the
same database-backed authority.
- Synthetic mesh configs are signed and node-agent verifies them against the
pinned cluster authority.
- Node enrollment pins cluster authority into `identity.json`.
- Before this audit, host-agent update plans were carried with signatures but
host-agent did not locally reject unsigned plans when a pinned authority was
present.
## Changes Made In This Audit
- The fabric docs now declare distributed authority and quorum as mandatory.
- Node/fabric endpoints must be explicit `host:port`; DNS-only service names are
rejected as fabric endpoints.
- `home-1` no longer advertises `smoke.cin.su` as a fabric endpoint. It now
advertises its real interface candidate `quic://192.168.200.85:18080`.
- Host-agent now verifies `node_update_plan` authority signatures when
`identity.json` contains a pinned cluster authority public key.
- Unsigned update plans are rejected in that pinned-authority mode.
- Added `rap.cluster_authority.quorum.v1` and
`rap.cluster_authority.quorum_envelope.v1` contracts to both agent and
backend authority packages.
- Host-agent can now verify quorum-signed update plans when `identity.json`
contains a pinned quorum descriptor.
- Backend update plans now include an `authority_quorum` envelope when the
cluster authority metadata contains a quorum descriptor. If that configured
quorum cannot be satisfied, the update plan is not issued.
- Node bootstrap now carries `cluster_authority_quorum`; the approval authority
payload signs the quorum descriptor hash, and node-agent persists the
descriptor into `identity.json` after verifying the signed hash.
- Published `rap-node-agent` and `rap-host-agent` release
`0.2.284-quorumauthority`.
- Canaried `home-1` to `rap-node-agent 0.2.284-quorumauthority` and
`rap-host-agent 0.2.284-quorumauthority`; both reported healthy/noop after
update.
- Published `rap-node-agent` and `rap-host-agent` release
`0.2.285-quorumbootstrap`.
- Canaried `home-1` to `rap-node-agent 0.2.285-quorumbootstrap` and
`rap-host-agent 0.2.285-quorumbootstrap`; both reported current=target/noop.
`ifcm-rufms-s-mo1cr` was intentionally not updated because it is behind NAT
and still needs fabric/update-cache artifact reachability before further
rollout.
## Remaining Production Blockers
- Replace `database_signer` with quorum authority:
M-of-N signatures from nodes or hardware/offline keys with
`control-authority` / `update-authority` roles.
- Store authority descriptors and role certificates as replicated signed state,
not only database rows.
- Require quorum envelopes for the remaining high-risk mutations: role
mutation, release creation, update policy mutation, route lease issuance,
relay/rendezvous lease issuance, storage placement, and authority rotation.
Node update plans and bootstrap quorum pinning now have the first contract
hooks, but production still needs real M-of-N signers.
- Add node-side verification of release metadata in addition to update-plan
verification; update-plan verification is now enforced by host-agent when a
pinned authority or pinned quorum descriptor exists.
- Add update-cache mirror selection through fabric endpoint candidates instead
of a single HTTP origin.
- Add signed endpoint-candidate epochs so peer directory gossip can survive API
replica loss.
- Add revocation/fencing epochs for compromised authority keys, nodes, and
update artifacts.
## Acceptance Rule
The cluster is not production-trust-ready while a single `database_signer` can
create authoritative cluster mutations. It may remain as a development bootstrap
signer only when every signed payload clearly identifies it as bootstrap and
nodes can be configured to reject it in production mode.