334 lines
8.7 KiB
Markdown
334 lines
8.7 KiB
Markdown
# Fabric Storage / Config Storage Service
|
|
|
|
Status: Stage C13 result. Documentation and architecture only.
|
|
|
|
This document defines the Fabric Storage / Config Storage service foundation.
|
|
It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP
|
|
tunnel runtime, relay packet routing, RDP work, or service workload execution.
|
|
|
|
## 1. Purpose
|
|
|
|
Fabric Storage / Config Storage is the scoped distribution layer for Fabric
|
|
Core configuration artifacts.
|
|
|
|
It distributes and caches:
|
|
|
|
- signed scoped cluster snapshots
|
|
- incremental snapshot updates
|
|
- peer directories
|
|
- trust bundles
|
|
- revocation metadata
|
|
- update artifact metadata
|
|
- service assignment/config artifacts
|
|
|
|
It exists so nodes can refresh local state quickly and reliably without asking
|
|
the backend database for every realtime routing or supervision decision.
|
|
|
|
## 2. Non-Goals
|
|
|
|
Fabric Storage / Config Storage is not:
|
|
|
|
- a replacement for PostgreSQL
|
|
- a second source of truth
|
|
- a general-purpose distributed database
|
|
- an arbitrary query engine
|
|
- a tenant-visible topology database
|
|
- a place for raw secrets
|
|
- a durable runtime lease store
|
|
- a high-rate realtime data-plane relay
|
|
|
|
Nodes must not write authoritative configuration directly into Fabric Storage.
|
|
|
|
## 3. Authority Model
|
|
|
|
Authoritative flow:
|
|
|
|
```text
|
|
PostgreSQL
|
|
-> control-plane config compiler
|
|
-> signed scoped artifact
|
|
-> Fabric Storage / Config Storage distribution
|
|
-> node-agent local state
|
|
```
|
|
|
|
Only the control plane, or a tightly scoped config compiler operating under
|
|
control-plane authority, may produce authoritative signed configuration
|
|
artifacts.
|
|
|
|
Fabric Storage may replicate and serve artifacts. It does not decide policy.
|
|
|
|
## 4. Artifact Types
|
|
|
|
Supported target artifact families:
|
|
|
|
- `cluster_snapshot`
|
|
- `snapshot_increment`
|
|
- `peer_directory`
|
|
- `trust_bundle`
|
|
- `revocation_list`
|
|
- `service_assignment_bundle`
|
|
- `route_policy_bundle`
|
|
- `qos_policy_bundle`
|
|
- `update_manifest`
|
|
- `storage_directory`
|
|
|
|
Each artifact must carry:
|
|
|
|
- artifact id
|
|
- artifact type
|
|
- cluster id
|
|
- scope ids
|
|
- config version
|
|
- authority epoch
|
|
- issued at
|
|
- expires at or refresh deadline
|
|
- signer key id
|
|
- content hash
|
|
- signature or signature reference
|
|
|
|
## 5. Scope and Namespace Rules
|
|
|
|
Storage namespaces must be scoped by:
|
|
|
|
- platform
|
|
- cluster
|
|
- organization where applicable
|
|
- service where applicable
|
|
- role where applicable
|
|
- node where applicable
|
|
- artifact family
|
|
|
|
Example logical namespace:
|
|
|
|
```text
|
|
platform/<platform_id>/
|
|
cluster/<cluster_id>/
|
|
trust/
|
|
snapshots/node/<node_id>/
|
|
snapshots/role/<role>/
|
|
peers/scope/<scope_id>/
|
|
services/<service_type>/
|
|
updates/
|
|
```
|
|
|
|
No node should receive access to namespaces outside its assigned cluster, role,
|
|
service, and organization scope.
|
|
|
|
## 6. Replication Policy
|
|
|
|
Replication is policy-driven.
|
|
|
|
Inputs:
|
|
|
|
- artifact criticality
|
|
- cluster size
|
|
- failure domains
|
|
- region placement
|
|
- node role
|
|
- organization isolation
|
|
- service locality
|
|
- update frequency
|
|
- recovery time objective
|
|
|
|
Rules:
|
|
|
|
- critical trust and revocation artifacts replicate across failure domains
|
|
- hot peer directories should be near entry/core/service nodes that use them
|
|
- service config should be near assigned service nodes
|
|
- organization-scoped artifacts must not replicate to unrelated org scopes
|
|
- thin/mobile nodes should not become broad storage replicas
|
|
- storage nodes may hold only assigned shards/scopes
|
|
|
|
## 7. Distribution Flows
|
|
|
|
Full snapshot refresh:
|
|
|
|
1. node-agent reports current config versions
|
|
2. storage service returns available version metadata
|
|
3. node-agent downloads full scoped snapshot if needed
|
|
4. node-agent verifies signature and scope
|
|
5. node-agent applies locally
|
|
|
|
Incremental update:
|
|
|
|
1. node-agent reports base version
|
|
2. storage service returns matching increment chain
|
|
3. node-agent verifies every increment
|
|
4. node-agent applies only if base versions match
|
|
5. version gap triggers full snapshot refresh
|
|
|
|
Trust/revocation update:
|
|
|
|
1. node-agent checks trust bundle/revocation version frequently
|
|
2. storage service serves signed trust artifacts
|
|
3. node-agent verifies using existing trust path
|
|
4. revoked identities/keys immediately affect local validation
|
|
|
|
## 8. Consistency and Invalidation
|
|
|
|
Artifacts are immutable by content hash.
|
|
|
|
New versions are published as new artifacts plus index updates.
|
|
|
|
Rules:
|
|
|
|
- node must validate content hash
|
|
- node must reject stale authority epoch
|
|
- node must reject invalid signature
|
|
- node must reject wrong scope
|
|
- storage index may cache version metadata but not override signatures
|
|
- deletion/tombstone artifacts must be signed
|
|
- revoked artifacts must not be served as current versions
|
|
|
|
Cache invalidation is version-based, not best-effort string deletion.
|
|
|
|
## 9. Storage Node Behavior
|
|
|
|
A storage/config node may:
|
|
|
|
- cache assigned artifacts
|
|
- replicate assigned artifacts
|
|
- serve artifacts to authorized nodes
|
|
- report artifact availability
|
|
- report replication health
|
|
- evict cold artifacts according to policy
|
|
|
|
A storage/config node must not:
|
|
|
|
- modify artifact content
|
|
- sign artifacts
|
|
- invent new config versions
|
|
- widen scope
|
|
- bypass authorization
|
|
- serve unrelated org/cluster data
|
|
- accept node writes as authoritative config
|
|
|
|
## 10. Authorization
|
|
|
|
Artifact fetch authorization must check:
|
|
|
|
- node identity
|
|
- cluster membership
|
|
- role assignment
|
|
- artifact scope
|
|
- organization scope where applicable
|
|
- artifact type
|
|
- trust/revocation status
|
|
- partition/degraded policy
|
|
|
|
Storage service authorization may use:
|
|
|
|
- mTLS node identity
|
|
- short-lived scoped tokens
|
|
- signed node snapshot claims
|
|
- control-plane issued fetch grants
|
|
|
|
Tenant users and organization admins must not directly query internal storage
|
|
namespaces. They see safe status projections through control-plane APIs.
|
|
|
|
## 11. Failure and Degraded Behavior
|
|
|
|
If local storage service is unavailable, node-agent recovery order is:
|
|
|
|
1. try alternate local/nearby storage endpoint
|
|
2. try active peers that advertise config/storage availability
|
|
3. try bootstrap/config endpoints from last signed snapshot
|
|
4. contact control plane if reachable
|
|
5. continue from last valid local snapshot only if degraded policy allows it
|
|
|
|
Storage service outage must not grant new authority.
|
|
|
|
Nodes must not perform high-risk actions based on missing or stale storage.
|
|
|
|
## 12. Operational Observability
|
|
|
|
Storage service should report:
|
|
|
|
- artifact family health
|
|
- replication lag
|
|
- missing replica count
|
|
- stale shard count
|
|
- fetch latency
|
|
- fetch failures
|
|
- authorization denials
|
|
- version gaps
|
|
- signature/hash validation failures reported by nodes
|
|
- storage capacity
|
|
- eviction stats
|
|
|
|
Audit/control-plane events should include:
|
|
|
|
- artifact published
|
|
- artifact revoked/tombstoned
|
|
- replication policy changed
|
|
- storage role assigned/removed
|
|
- unauthorized fetch denied
|
|
- critical artifact under-replicated
|
|
|
|
## 13. Security Requirements
|
|
|
|
Required:
|
|
|
|
- encrypted node-to-storage transport
|
|
- authenticated node identity
|
|
- scoped fetch authorization
|
|
- immutable signed artifacts
|
|
- hash verification
|
|
- no raw secrets in broad artifacts
|
|
- namespace isolation
|
|
- audit for high-risk storage/admin actions
|
|
|
|
Compromised storage node blast radius must be limited:
|
|
|
|
- it cannot sign valid new artifacts
|
|
- it cannot serve data outside assigned scopes
|
|
- it cannot modify signed content without detection
|
|
- it cannot become authoritative truth
|
|
- nodes reject invalid signatures/hashes
|
|
|
|
## 14. Relationship to Runtime State
|
|
|
|
Fabric Storage is for configuration and distribution, not realtime runtime
|
|
coordination.
|
|
|
|
Runtime state remains elsewhere:
|
|
|
|
- PostgreSQL for durable lifecycle/audit/state
|
|
- Redis for live coordination/leases/heartbeats/ephemeral routing hints
|
|
- node-local state for local cache/runtime observations
|
|
|
|
Do not store high-rate render frames, input streams, VPN packets, or relay
|
|
traffic in Fabric Storage.
|
|
|
|
## 15. C14 Preparation
|
|
|
|
C14 must define the peer directory and peer cache model that Fabric Storage may
|
|
distribute and node-agent may store locally.
|
|
|
|
C14 must preserve:
|
|
|
|
- storage service is distribution/cache only
|
|
- peer directories are scoped
|
|
- nodes do not learn full topology unless role requires it
|
|
- routing decisions belong to Fabric Routing Engine, not Service Adapters
|
|
|
|
## 16. Result / Decision
|
|
|
|
Stage C13 defines Fabric Storage / Config Storage as a scoped distribution and
|
|
cache service for signed Fabric Core artifacts.
|
|
|
|
Decisions:
|
|
|
|
- Fabric Storage distributes signed artifacts but does not author them
|
|
- PostgreSQL remains authoritative
|
|
- artifacts are immutable by content hash
|
|
- invalidation is version-based
|
|
- replication is policy-driven and scope-bound
|
|
- storage nodes may cache and serve only assigned scopes
|
|
- storage service is not a realtime data-plane relay
|
|
- storage service is not a general-purpose database
|
|
- C14 must define the peer directory/cache artifacts and local runtime use
|
|
|
|
No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service
|
|
workload behavior is changed by C13.
|