Initial project snapshot
This commit is contained in:
@@ -0,0 +1,333 @@
|
||||
# Fabric Storage / Config Storage Service
|
||||
|
||||
Status: Stage C13 result. Documentation and architecture only.
|
||||
|
||||
This document defines the Fabric Storage / Config Storage service foundation.
|
||||
It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP
|
||||
tunnel runtime, relay packet routing, RDP work, or service workload execution.
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
Fabric Storage / Config Storage is the scoped distribution layer for Fabric
|
||||
Core configuration artifacts.
|
||||
|
||||
It distributes and caches:
|
||||
|
||||
- signed scoped cluster snapshots
|
||||
- incremental snapshot updates
|
||||
- peer directories
|
||||
- trust bundles
|
||||
- revocation metadata
|
||||
- update artifact metadata
|
||||
- service assignment/config artifacts
|
||||
|
||||
It exists so nodes can refresh local state quickly and reliably without asking
|
||||
the backend database for every realtime routing or supervision decision.
|
||||
|
||||
## 2. Non-Goals
|
||||
|
||||
Fabric Storage / Config Storage is not:
|
||||
|
||||
- a replacement for PostgreSQL
|
||||
- a second source of truth
|
||||
- a general-purpose distributed database
|
||||
- an arbitrary query engine
|
||||
- a tenant-visible topology database
|
||||
- a place for raw secrets
|
||||
- a durable runtime lease store
|
||||
- a high-rate realtime data-plane relay
|
||||
|
||||
Nodes must not write authoritative configuration directly into Fabric Storage.
|
||||
|
||||
## 3. Authority Model
|
||||
|
||||
Authoritative flow:
|
||||
|
||||
```text
|
||||
PostgreSQL
|
||||
-> control-plane config compiler
|
||||
-> signed scoped artifact
|
||||
-> Fabric Storage / Config Storage distribution
|
||||
-> node-agent local state
|
||||
```
|
||||
|
||||
Only the control plane, or a tightly scoped config compiler operating under
|
||||
control-plane authority, may produce authoritative signed configuration
|
||||
artifacts.
|
||||
|
||||
Fabric Storage may replicate and serve artifacts. It does not decide policy.
|
||||
|
||||
## 4. Artifact Types
|
||||
|
||||
Supported target artifact families:
|
||||
|
||||
- `cluster_snapshot`
|
||||
- `snapshot_increment`
|
||||
- `peer_directory`
|
||||
- `trust_bundle`
|
||||
- `revocation_list`
|
||||
- `service_assignment_bundle`
|
||||
- `route_policy_bundle`
|
||||
- `qos_policy_bundle`
|
||||
- `update_manifest`
|
||||
- `storage_directory`
|
||||
|
||||
Each artifact must carry:
|
||||
|
||||
- artifact id
|
||||
- artifact type
|
||||
- cluster id
|
||||
- scope ids
|
||||
- config version
|
||||
- authority epoch
|
||||
- issued at
|
||||
- expires at or refresh deadline
|
||||
- signer key id
|
||||
- content hash
|
||||
- signature or signature reference
|
||||
|
||||
## 5. Scope and Namespace Rules
|
||||
|
||||
Storage namespaces must be scoped by:
|
||||
|
||||
- platform
|
||||
- cluster
|
||||
- organization where applicable
|
||||
- service where applicable
|
||||
- role where applicable
|
||||
- node where applicable
|
||||
- artifact family
|
||||
|
||||
Example logical namespace:
|
||||
|
||||
```text
|
||||
platform/<platform_id>/
|
||||
cluster/<cluster_id>/
|
||||
trust/
|
||||
snapshots/node/<node_id>/
|
||||
snapshots/role/<role>/
|
||||
peers/scope/<scope_id>/
|
||||
services/<service_type>/
|
||||
updates/
|
||||
```
|
||||
|
||||
No node should receive access to namespaces outside its assigned cluster, role,
|
||||
service, and organization scope.
|
||||
|
||||
## 6. Replication Policy
|
||||
|
||||
Replication is policy-driven.
|
||||
|
||||
Inputs:
|
||||
|
||||
- artifact criticality
|
||||
- cluster size
|
||||
- failure domains
|
||||
- region placement
|
||||
- node role
|
||||
- organization isolation
|
||||
- service locality
|
||||
- update frequency
|
||||
- recovery time objective
|
||||
|
||||
Rules:
|
||||
|
||||
- critical trust and revocation artifacts replicate across failure domains
|
||||
- hot peer directories should be near entry/core/service nodes that use them
|
||||
- service config should be near assigned service nodes
|
||||
- organization-scoped artifacts must not replicate to unrelated org scopes
|
||||
- thin/mobile nodes should not become broad storage replicas
|
||||
- storage nodes may hold only assigned shards/scopes
|
||||
|
||||
## 7. Distribution Flows
|
||||
|
||||
Full snapshot refresh:
|
||||
|
||||
1. node-agent reports current config versions
|
||||
2. storage service returns available version metadata
|
||||
3. node-agent downloads full scoped snapshot if needed
|
||||
4. node-agent verifies signature and scope
|
||||
5. node-agent applies locally
|
||||
|
||||
Incremental update:
|
||||
|
||||
1. node-agent reports base version
|
||||
2. storage service returns matching increment chain
|
||||
3. node-agent verifies every increment
|
||||
4. node-agent applies only if base versions match
|
||||
5. version gap triggers full snapshot refresh
|
||||
|
||||
Trust/revocation update:
|
||||
|
||||
1. node-agent checks trust bundle/revocation version frequently
|
||||
2. storage service serves signed trust artifacts
|
||||
3. node-agent verifies using existing trust path
|
||||
4. revoked identities/keys immediately affect local validation
|
||||
|
||||
## 8. Consistency and Invalidation
|
||||
|
||||
Artifacts are immutable by content hash.
|
||||
|
||||
New versions are published as new artifacts plus index updates.
|
||||
|
||||
Rules:
|
||||
|
||||
- node must validate content hash
|
||||
- node must reject stale authority epoch
|
||||
- node must reject invalid signature
|
||||
- node must reject wrong scope
|
||||
- storage index may cache version metadata but not override signatures
|
||||
- deletion/tombstone artifacts must be signed
|
||||
- revoked artifacts must not be served as current versions
|
||||
|
||||
Cache invalidation is version-based, not best-effort string deletion.
|
||||
|
||||
## 9. Storage Node Behavior
|
||||
|
||||
A storage/config node may:
|
||||
|
||||
- cache assigned artifacts
|
||||
- replicate assigned artifacts
|
||||
- serve artifacts to authorized nodes
|
||||
- report artifact availability
|
||||
- report replication health
|
||||
- evict cold artifacts according to policy
|
||||
|
||||
A storage/config node must not:
|
||||
|
||||
- modify artifact content
|
||||
- sign artifacts
|
||||
- invent new config versions
|
||||
- widen scope
|
||||
- bypass authorization
|
||||
- serve unrelated org/cluster data
|
||||
- accept node writes as authoritative config
|
||||
|
||||
## 10. Authorization
|
||||
|
||||
Artifact fetch authorization must check:
|
||||
|
||||
- node identity
|
||||
- cluster membership
|
||||
- role assignment
|
||||
- artifact scope
|
||||
- organization scope where applicable
|
||||
- artifact type
|
||||
- trust/revocation status
|
||||
- partition/degraded policy
|
||||
|
||||
Storage service authorization may use:
|
||||
|
||||
- mTLS node identity
|
||||
- short-lived scoped tokens
|
||||
- signed node snapshot claims
|
||||
- control-plane issued fetch grants
|
||||
|
||||
Tenant users and organization admins must not directly query internal storage
|
||||
namespaces. They see safe status projections through control-plane APIs.
|
||||
|
||||
## 11. Failure and Degraded Behavior
|
||||
|
||||
If local storage service is unavailable, node-agent recovery order is:
|
||||
|
||||
1. try alternate local/nearby storage endpoint
|
||||
2. try active peers that advertise config/storage availability
|
||||
3. try bootstrap/config endpoints from last signed snapshot
|
||||
4. contact control plane if reachable
|
||||
5. continue from last valid local snapshot only if degraded policy allows it
|
||||
|
||||
Storage service outage must not grant new authority.
|
||||
|
||||
Nodes must not perform high-risk actions based on missing or stale storage.
|
||||
|
||||
## 12. Operational Observability
|
||||
|
||||
Storage service should report:
|
||||
|
||||
- artifact family health
|
||||
- replication lag
|
||||
- missing replica count
|
||||
- stale shard count
|
||||
- fetch latency
|
||||
- fetch failures
|
||||
- authorization denials
|
||||
- version gaps
|
||||
- signature/hash validation failures reported by nodes
|
||||
- storage capacity
|
||||
- eviction stats
|
||||
|
||||
Audit/control-plane events should include:
|
||||
|
||||
- artifact published
|
||||
- artifact revoked/tombstoned
|
||||
- replication policy changed
|
||||
- storage role assigned/removed
|
||||
- unauthorized fetch denied
|
||||
- critical artifact under-replicated
|
||||
|
||||
## 13. Security Requirements
|
||||
|
||||
Required:
|
||||
|
||||
- encrypted node-to-storage transport
|
||||
- authenticated node identity
|
||||
- scoped fetch authorization
|
||||
- immutable signed artifacts
|
||||
- hash verification
|
||||
- no raw secrets in broad artifacts
|
||||
- namespace isolation
|
||||
- audit for high-risk storage/admin actions
|
||||
|
||||
Compromised storage node blast radius must be limited:
|
||||
|
||||
- it cannot sign valid new artifacts
|
||||
- it cannot serve data outside assigned scopes
|
||||
- it cannot modify signed content without detection
|
||||
- it cannot become authoritative truth
|
||||
- nodes reject invalid signatures/hashes
|
||||
|
||||
## 14. Relationship to Runtime State
|
||||
|
||||
Fabric Storage is for configuration and distribution, not realtime runtime
|
||||
coordination.
|
||||
|
||||
Runtime state remains elsewhere:
|
||||
|
||||
- PostgreSQL for durable lifecycle/audit/state
|
||||
- Redis for live coordination/leases/heartbeats/ephemeral routing hints
|
||||
- node-local state for local cache/runtime observations
|
||||
|
||||
Do not store high-rate render frames, input streams, VPN packets, or relay
|
||||
traffic in Fabric Storage.
|
||||
|
||||
## 15. C14 Preparation
|
||||
|
||||
C14 must define the peer directory and peer cache model that Fabric Storage may
|
||||
distribute and node-agent may store locally.
|
||||
|
||||
C14 must preserve:
|
||||
|
||||
- storage service is distribution/cache only
|
||||
- peer directories are scoped
|
||||
- nodes do not learn full topology unless role requires it
|
||||
- routing decisions belong to Fabric Routing Engine, not Service Adapters
|
||||
|
||||
## 16. Result / Decision
|
||||
|
||||
Stage C13 defines Fabric Storage / Config Storage as a scoped distribution and
|
||||
cache service for signed Fabric Core artifacts.
|
||||
|
||||
Decisions:
|
||||
|
||||
- Fabric Storage distributes signed artifacts but does not author them
|
||||
- PostgreSQL remains authoritative
|
||||
- artifacts are immutable by content hash
|
||||
- invalidation is version-based
|
||||
- replication is policy-driven and scope-bound
|
||||
- storage nodes may cache and serve only assigned scopes
|
||||
- storage service is not a realtime data-plane relay
|
||||
- storage service is not a general-purpose database
|
||||
- C14 must define the peer directory/cache artifacts and local runtime use
|
||||
|
||||
No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service
|
||||
workload behavior is changed by C13.
|
||||
Reference in New Issue
Block a user