Files
rdp-proxy/docs/architecture/FABRIC_STORAGE_CONFIG_SERVICE.md
2026-04-28 22:29:50 +03:00

8.7 KiB

Fabric Storage / Config Storage Service

Status: Stage C13 result. Documentation and architecture only.

This document defines the Fabric Storage / Config Storage service foundation. It does not implement code, migrations, APIs, mesh runtime traffic, VPN/IP tunnel runtime, relay packet routing, RDP work, or service workload execution.

1. Purpose

Fabric Storage / Config Storage is the scoped distribution layer for Fabric Core configuration artifacts.

It distributes and caches:

  • signed scoped cluster snapshots
  • incremental snapshot updates
  • peer directories
  • trust bundles
  • revocation metadata
  • update artifact metadata
  • service assignment/config artifacts

It exists so nodes can refresh local state quickly and reliably without asking the backend database for every realtime routing or supervision decision.

2. Non-Goals

Fabric Storage / Config Storage is not:

  • a replacement for PostgreSQL
  • a second source of truth
  • a general-purpose distributed database
  • an arbitrary query engine
  • a tenant-visible topology database
  • a place for raw secrets
  • a durable runtime lease store
  • a high-rate realtime data-plane relay

Nodes must not write authoritative configuration directly into Fabric Storage.

3. Authority Model

Authoritative flow:

PostgreSQL
  -> control-plane config compiler
  -> signed scoped artifact
  -> Fabric Storage / Config Storage distribution
  -> node-agent local state

Only the control plane, or a tightly scoped config compiler operating under control-plane authority, may produce authoritative signed configuration artifacts.

Fabric Storage may replicate and serve artifacts. It does not decide policy.

4. Artifact Types

Supported target artifact families:

  • cluster_snapshot
  • snapshot_increment
  • peer_directory
  • trust_bundle
  • revocation_list
  • service_assignment_bundle
  • route_policy_bundle
  • qos_policy_bundle
  • update_manifest
  • storage_directory

Each artifact must carry:

  • artifact id
  • artifact type
  • cluster id
  • scope ids
  • config version
  • authority epoch
  • issued at
  • expires at or refresh deadline
  • signer key id
  • content hash
  • signature or signature reference

5. Scope and Namespace Rules

Storage namespaces must be scoped by:

  • platform
  • cluster
  • organization where applicable
  • service where applicable
  • role where applicable
  • node where applicable
  • artifact family

Example logical namespace:

platform/<platform_id>/
  cluster/<cluster_id>/
    trust/
    snapshots/node/<node_id>/
    snapshots/role/<role>/
    peers/scope/<scope_id>/
    services/<service_type>/
    updates/

No node should receive access to namespaces outside its assigned cluster, role, service, and organization scope.

6. Replication Policy

Replication is policy-driven.

Inputs:

  • artifact criticality
  • cluster size
  • failure domains
  • region placement
  • node role
  • organization isolation
  • service locality
  • update frequency
  • recovery time objective

Rules:

  • critical trust and revocation artifacts replicate across failure domains
  • hot peer directories should be near entry/core/service nodes that use them
  • service config should be near assigned service nodes
  • organization-scoped artifacts must not replicate to unrelated org scopes
  • thin/mobile nodes should not become broad storage replicas
  • storage nodes may hold only assigned shards/scopes

7. Distribution Flows

Full snapshot refresh:

  1. node-agent reports current config versions
  2. storage service returns available version metadata
  3. node-agent downloads full scoped snapshot if needed
  4. node-agent verifies signature and scope
  5. node-agent applies locally

Incremental update:

  1. node-agent reports base version
  2. storage service returns matching increment chain
  3. node-agent verifies every increment
  4. node-agent applies only if base versions match
  5. version gap triggers full snapshot refresh

Trust/revocation update:

  1. node-agent checks trust bundle/revocation version frequently
  2. storage service serves signed trust artifacts
  3. node-agent verifies using existing trust path
  4. revoked identities/keys immediately affect local validation

8. Consistency and Invalidation

Artifacts are immutable by content hash.

New versions are published as new artifacts plus index updates.

Rules:

  • node must validate content hash
  • node must reject stale authority epoch
  • node must reject invalid signature
  • node must reject wrong scope
  • storage index may cache version metadata but not override signatures
  • deletion/tombstone artifacts must be signed
  • revoked artifacts must not be served as current versions

Cache invalidation is version-based, not best-effort string deletion.

9. Storage Node Behavior

A storage/config node may:

  • cache assigned artifacts
  • replicate assigned artifacts
  • serve artifacts to authorized nodes
  • report artifact availability
  • report replication health
  • evict cold artifacts according to policy

A storage/config node must not:

  • modify artifact content
  • sign artifacts
  • invent new config versions
  • widen scope
  • bypass authorization
  • serve unrelated org/cluster data
  • accept node writes as authoritative config

10. Authorization

Artifact fetch authorization must check:

  • node identity
  • cluster membership
  • role assignment
  • artifact scope
  • organization scope where applicable
  • artifact type
  • trust/revocation status
  • partition/degraded policy

Storage service authorization may use:

  • mTLS node identity
  • short-lived scoped tokens
  • signed node snapshot claims
  • control-plane issued fetch grants

Tenant users and organization admins must not directly query internal storage namespaces. They see safe status projections through control-plane APIs.

11. Failure and Degraded Behavior

If local storage service is unavailable, node-agent recovery order is:

  1. try alternate local/nearby storage endpoint
  2. try active peers that advertise config/storage availability
  3. try bootstrap/config endpoints from last signed snapshot
  4. contact control plane if reachable
  5. continue from last valid local snapshot only if degraded policy allows it

Storage service outage must not grant new authority.

Nodes must not perform high-risk actions based on missing or stale storage.

12. Operational Observability

Storage service should report:

  • artifact family health
  • replication lag
  • missing replica count
  • stale shard count
  • fetch latency
  • fetch failures
  • authorization denials
  • version gaps
  • signature/hash validation failures reported by nodes
  • storage capacity
  • eviction stats

Audit/control-plane events should include:

  • artifact published
  • artifact revoked/tombstoned
  • replication policy changed
  • storage role assigned/removed
  • unauthorized fetch denied
  • critical artifact under-replicated

13. Security Requirements

Required:

  • encrypted node-to-storage transport
  • authenticated node identity
  • scoped fetch authorization
  • immutable signed artifacts
  • hash verification
  • no raw secrets in broad artifacts
  • namespace isolation
  • audit for high-risk storage/admin actions

Compromised storage node blast radius must be limited:

  • it cannot sign valid new artifacts
  • it cannot serve data outside assigned scopes
  • it cannot modify signed content without detection
  • it cannot become authoritative truth
  • nodes reject invalid signatures/hashes

14. Relationship to Runtime State

Fabric Storage is for configuration and distribution, not realtime runtime coordination.

Runtime state remains elsewhere:

  • PostgreSQL for durable lifecycle/audit/state
  • Redis for live coordination/leases/heartbeats/ephemeral routing hints
  • node-local state for local cache/runtime observations

Do not store high-rate render frames, input streams, VPN packets, or relay traffic in Fabric Storage.

15. C14 Preparation

C14 must define the peer directory and peer cache model that Fabric Storage may distribute and node-agent may store locally.

C14 must preserve:

  • storage service is distribution/cache only
  • peer directories are scoped
  • nodes do not learn full topology unless role requires it
  • routing decisions belong to Fabric Routing Engine, not Service Adapters

16. Result / Decision

Stage C13 defines Fabric Storage / Config Storage as a scoped distribution and cache service for signed Fabric Core artifacts.

Decisions:

  • Fabric Storage distributes signed artifacts but does not author them
  • PostgreSQL remains authoritative
  • artifacts are immutable by content hash
  • invalidation is version-based
  • replication is policy-driven and scope-bound
  • storage nodes may cache and serve only assigned scopes
  • storage service is not a realtime data-plane relay
  • storage service is not a general-purpose database
  • C14 must define the peer directory/cache artifacts and local runtime use

No code, migration, API, runtime, RDP, data-plane, mesh, VPN, relay, or service workload behavior is changed by C13.