Files

T

m 469fa0e860 3

2026-05-18 21:33:39 +03:00

7.8 KiB

Raw Blame History

Security And Secrets Readiness

Archived scope note: this document records an earlier RDP/direct-worker trust and secret-handling stage. It is not the current source of truth for fabric transport architecture. The active inter-node transport model is QUIC-only; see docs/architecture/DISTRIBUTED_FABRIC_NODE_PROTOCOL_PLAN.md, docs/architecture/FABRIC_FIRST_TRANSPORT_AND_STRESS_PLAN.md, and docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md.

Status: P3.3 historical test-stand smoke complete for encrypted resource secrets, assignment-time resolution, and legacy RDP baseline behavior with smoke-only direct-worker trust.

This document defines the next security hardening layer around the accepted RDP MVP baseline. It does not implement mesh, VPN, server-to-client download, new protocol adapters, or another RDP rendering mode.

Current Accepted Historical RDP Baseline

RDP worker baseline: rap-rdp-worker:rdp-p1-region-order2
Backend control plane remains source of truth.
Redis remains live coordination/routing only.
Historical direct-worker WSS was the preferred realtime RDP path in this stage.
Historical backend gateway remained a fallback/debug path for this stage.
Text clipboard is policy-gated and accepted.
Client-to-server file upload and restricted RAP_Transfers visibility are accepted.

Problem

The current smoke/dev path can still seed RDP target credentials inside resource metadata. That was acceptable for proving lifecycle and RDP adapter behavior, but it must not be the production contract.

Production must not rely on plaintext target passwords, usernames, domain credentials, client secrets, tokens, or private keys stored in generic resource metadata.

Target Secret Model

Resources keep non-secret connection shape:

{
  "id": "...",
  "organization_id": "...",
  "protocol": "rdp",
  "address": "rdp.example.internal:3389",
  "secret_ref": "rap-secret://org/<org_id>/resources/<resource_id>/rdp-primary",
  "metadata": {
    "certificate_verification_mode": "strict",
    "render_quality_profile": "balanced"
  }
}

Secrets are stored separately and referenced by secret_ref. The secret payload is protocol-specific and versioned:

{
  "version": 1,
  "protocol": "rdp",
  "username": "...",
  "domain": "...",
  "password": "...",
  "rotation_version": 3
}

The reference, not the plaintext secret, is copied into session metadata and audit context.

Runtime Secret Resolution

Production runtime should resolve secrets through a dedicated secret resolver:

Backend validates resource/org/user authorization.
Backend starts the session using resource secret_ref.
Worker receives assignment with secret_ref, not plaintext credentials.
Worker asks an authorized secret resolver for the secret using:
- organization_id
- resource_id
- worker_id
- session_id
- short-lived lease/session proof
Secret resolver returns credentials only to authorized workers for active leased sessions.
Worker keeps secret material in memory only and never logs it.

The current P3.1 MVP uses an encrypted PostgreSQL-backed store:

resource_secrets stores ciphertext, nonce, key id, algorithm, version, safe metadata, and payload_sha256.
SECRET_ENCRYPTION_KEY_B64 or SECRET_ENCRYPTION_KEY_FILE supplies the AES-256-GCM key.
SECRET_ENCRYPTION_KEY_ID labels the active key.
the API can create/rotate a resource secret, but never returns plaintext.
session assignment resolves the secret only after organization/resource/ worker/session/lease checks.

The resolver boundary can later be backed by KMS, Vault, cloud secret managers, or node-local secure delivery without changing the resource secret_ref contract.

Production Guard

In APP_ENV=production:

RDP/VNC/SSH resources must have secret_ref.
Plain credential-like keys are rejected in resource metadata.
Session start rejects legacy resources that still contain plaintext credential-like metadata.
backend startup requires secret encryption key material.
Development/smoke environments may continue using plaintext metadata while the resolver path is not used, but this is explicitly not production mode.

Credential-like metadata keys include password, username, domain, token, private key, client secret, credential, credentials, secret, and common underscore/hyphen variants.

Data Plane Trust

Already accepted:

backend signs data_plane_token with RS256 private key
worker validates with public key only
token is short-lived
token includes session, attachment, user, organization, worker, resource, allowed channels, expiry, and jti
worker rejects wrong worker, wrong attachment, wrong organization, wrong resource, over-broad channels, failed/terminated sessions, and jti replay

Production still needed for that stage:

deployed certificate chain for the historical direct-worker WSS path on production nodes
pinned or platform-issued worker certificates in live production config for that historical path
no smoke-only TLS bypass in production clients
rotation process for data-plane signing keys
audit for failed token validation/bind attempts

P3.2 historical guard exists:

backend distinguished smoke_insecure, public_ca, and platform_ca direct-worker trust modes for the historical RDP path
production backend omitted smoke-only direct candidates on that path
Windows production client skipped untrusted or smoke-only direct candidates

P3.3 historical test-stand smoke exists:

resource_secrets migration is applied on docker-test
backend runs as APP_ENV=production with a test-only SECRET_ENCRYPTION_KEY_FILE
a secret-backed RDP resource starts a real session through assignment-time secret resolution
resources.metadata, remote_sessions.metadata, and audit_events were checked for plaintext username/password leakage
production backend with DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE=smoke_insecure returned the historical backend gateway debug path only
development/smoke backend with the same trust mode advertises the explicit smoke-only historical direct-worker candidate
RAP_Transfers smoke passed on the secret-backed resource

Required Regression Tests

P3 must protect:

plaintext resource credentials rejected in production
RDP production resources require secret_ref
development smoke plaintext metadata remains allowed
data-plane allowed channels follow runtime policy
direct bind rejects wrong worker
direct bind rejects wrong user
direct bind rejects wrong organization
direct bind rejects wrong resource
direct bind rejects old attachment
direct bind rejects failed/terminated states

Audit Events

Current audit coverage should remain for:

session start
attach
detach
takeover
terminate
failure

Future audit coverage should add:

secret deleted
production resource rejected because plaintext credential metadata was found

Audit entries must reference secret_ref and resource/session ids, never plaintext secret values.

P3.1 implemented audit events for:

resource_secret_rotated
resource_secret_accessed
resource_secret_access_denied

Remaining Production Gaps

External KMS/Vault integration is not implemented yet.
Master-key rotation/re-encryption workflow is not implemented yet.
The worker still receives resolved credentials through the transient assignment payload; a future resolver pull/token flow should reduce exposure in Redis control queues.
Worker still depends on plaintext assignment metadata for development smoke.
Production certificate issuance/rotation and platform CA distribution for the historical direct-worker path are not complete.
The test-stand secret key is a host-local test file, not a production KMS or HSM-backed key.
Automated end-to-end policy denial coverage is still thin.

7.8 KiB Raw Blame History