Files
rdp-proxy/docs/architecture/FABRIC_STORAGE_CONFIG_SERVICE.md
2026-04-28 22:29:50 +03:00

334 lines
8.7 KiB
Markdown

# Fabric Storage / Config Storage Service
Status: Stage C13 result. Documentation and architecture only.
This document defines the Fabric Storage / Config Storage service foundation.
It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP
tunnel runtime, relay packet routing, RDP work, or service workload execution.
## 1. Purpose
Fabric Storage / Config Storage is the scoped distribution layer for Fabric
Core configuration artifacts.
It distributes and caches:
- signed scoped cluster snapshots
- incremental snapshot updates
- peer directories
- trust bundles
- revocation metadata
- update artifact metadata
- service assignment/config artifacts
It exists so nodes can refresh local state quickly and reliably without asking
the backend database for every realtime routing or supervision decision.
## 2. Non-Goals
Fabric Storage / Config Storage is not:
- a replacement for PostgreSQL
- a second source of truth
- a general-purpose distributed database
- an arbitrary query engine
- a tenant-visible topology database
- a place for raw secrets
- a durable runtime lease store
- a high-rate realtime data-plane relay
Nodes must not write authoritative configuration directly into Fabric Storage.
## 3. Authority Model
Authoritative flow:
```text
PostgreSQL
-> control-plane config compiler
-> signed scoped artifact
-> Fabric Storage / Config Storage distribution
-> node-agent local state
```
Only the control plane, or a tightly scoped config compiler operating under
control-plane authority, may produce authoritative signed configuration
artifacts.
Fabric Storage may replicate and serve artifacts. It does not decide policy.
## 4. Artifact Types
Supported target artifact families:
- `cluster_snapshot`
- `snapshot_increment`
- `peer_directory`
- `trust_bundle`
- `revocation_list`
- `service_assignment_bundle`
- `route_policy_bundle`
- `qos_policy_bundle`
- `update_manifest`
- `storage_directory`
Each artifact must carry:
- artifact id
- artifact type
- cluster id
- scope ids
- config version
- authority epoch
- issued at
- expires at or refresh deadline
- signer key id
- content hash
- signature or signature reference
## 5. Scope and Namespace Rules
Storage namespaces must be scoped by:
- platform
- cluster
- organization where applicable
- service where applicable
- role where applicable
- node where applicable
- artifact family
Example logical namespace:
```text
platform/<platform_id>/
cluster/<cluster_id>/
trust/
snapshots/node/<node_id>/
snapshots/role/<role>/
peers/scope/<scope_id>/
services/<service_type>/
updates/
```
No node should receive access to namespaces outside its assigned cluster, role,
service, and organization scope.
## 6. Replication Policy
Replication is policy-driven.
Inputs:
- artifact criticality
- cluster size
- failure domains
- region placement
- node role
- organization isolation
- service locality
- update frequency
- recovery time objective
Rules:
- critical trust and revocation artifacts replicate across failure domains
- hot peer directories should be near entry/core/service nodes that use them
- service config should be near assigned service nodes
- organization-scoped artifacts must not replicate to unrelated org scopes
- thin/mobile nodes should not become broad storage replicas
- storage nodes may hold only assigned shards/scopes
## 7. Distribution Flows
Full snapshot refresh:
1. node-agent reports current config versions
2. storage service returns available version metadata
3. node-agent downloads full scoped snapshot if needed
4. node-agent verifies signature and scope
5. node-agent applies locally
Incremental update:
1. node-agent reports base version
2. storage service returns matching increment chain
3. node-agent verifies every increment
4. node-agent applies only if base versions match
5. version gap triggers full snapshot refresh
Trust/revocation update:
1. node-agent checks trust bundle/revocation version frequently
2. storage service serves signed trust artifacts
3. node-agent verifies using existing trust path
4. revoked identities/keys immediately affect local validation
## 8. Consistency and Invalidation
Artifacts are immutable by content hash.
New versions are published as new artifacts plus index updates.
Rules:
- node must validate content hash
- node must reject stale authority epoch
- node must reject invalid signature
- node must reject wrong scope
- storage index may cache version metadata but not override signatures
- deletion/tombstone artifacts must be signed
- revoked artifacts must not be served as current versions
Cache invalidation is version-based, not best-effort string deletion.
## 9. Storage Node Behavior
A storage/config node may:
- cache assigned artifacts
- replicate assigned artifacts
- serve artifacts to authorized nodes
- report artifact availability
- report replication health
- evict cold artifacts according to policy
A storage/config node must not:
- modify artifact content
- sign artifacts
- invent new config versions
- widen scope
- bypass authorization
- serve unrelated org/cluster data
- accept node writes as authoritative config
## 10. Authorization
Artifact fetch authorization must check:
- node identity
- cluster membership
- role assignment
- artifact scope
- organization scope where applicable
- artifact type
- trust/revocation status
- partition/degraded policy
Storage service authorization may use:
- mTLS node identity
- short-lived scoped tokens
- signed node snapshot claims
- control-plane issued fetch grants
Tenant users and organization admins must not directly query internal storage
namespaces. They see safe status projections through control-plane APIs.
## 11. Failure and Degraded Behavior
If local storage service is unavailable, node-agent recovery order is:
1. try alternate local/nearby storage endpoint
2. try active peers that advertise config/storage availability
3. try bootstrap/config endpoints from last signed snapshot
4. contact control plane if reachable
5. continue from last valid local snapshot only if degraded policy allows it
Storage service outage must not grant new authority.
Nodes must not perform high-risk actions based on missing or stale storage.
## 12. Operational Observability
Storage service should report:
- artifact family health
- replication lag
- missing replica count
- stale shard count
- fetch latency
- fetch failures
- authorization denials
- version gaps
- signature/hash validation failures reported by nodes
- storage capacity
- eviction stats
Audit/control-plane events should include:
- artifact published
- artifact revoked/tombstoned
- replication policy changed
- storage role assigned/removed
- unauthorized fetch denied
- critical artifact under-replicated
## 13. Security Requirements
Required:
- encrypted node-to-storage transport
- authenticated node identity
- scoped fetch authorization
- immutable signed artifacts
- hash verification
- no raw secrets in broad artifacts
- namespace isolation
- audit for high-risk storage/admin actions
Compromised storage node blast radius must be limited:
- it cannot sign valid new artifacts
- it cannot serve data outside assigned scopes
- it cannot modify signed content without detection
- it cannot become authoritative truth
- nodes reject invalid signatures/hashes
## 14. Relationship to Runtime State
Fabric Storage is for configuration and distribution, not realtime runtime
coordination.
Runtime state remains elsewhere:
- PostgreSQL for durable lifecycle/audit/state
- Redis for live coordination/leases/heartbeats/ephemeral routing hints
- node-local state for local cache/runtime observations
Do not store high-rate render frames, input streams, VPN packets, or relay
traffic in Fabric Storage.
## 15. C14 Preparation
C14 must define the peer directory and peer cache model that Fabric Storage may
distribute and node-agent may store locally.
C14 must preserve:
- storage service is distribution/cache only
- peer directories are scoped
- nodes do not learn full topology unless role requires it
- routing decisions belong to Fabric Routing Engine, not Service Adapters
## 16. Result / Decision
Stage C13 defines Fabric Storage / Config Storage as a scoped distribution and
cache service for signed Fabric Core artifacts.
Decisions:
- Fabric Storage distributes signed artifacts but does not author them
- PostgreSQL remains authoritative
- artifacts are immutable by content hash
- invalidation is version-based
- replication is policy-driven and scope-bound
- storage nodes may cache and serve only assigned scopes
- storage service is not a realtime data-plane relay
- storage service is not a general-purpose database
- C14 must define the peer directory/cache artifacts and local runtime use
No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service
workload behavior is changed by C13.