220 lines
7.8 KiB
Markdown
220 lines
7.8 KiB
Markdown
# Security And Secrets Readiness
|
|
|
|
Archived scope note: this document records an earlier RDP/direct-worker trust
|
|
and secret-handling stage. It is not the current source of truth for fabric
|
|
transport architecture. The active inter-node transport model is QUIC-only; see
|
|
`docs/architecture/DISTRIBUTED_FABRIC_NODE_PROTOCOL_PLAN.md`,
|
|
`docs/architecture/FABRIC_FIRST_TRANSPORT_AND_STRESS_PLAN.md`, and
|
|
`docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md`.
|
|
|
|
Status: P3.3 historical test-stand smoke complete for encrypted resource
|
|
secrets, assignment-time resolution, and legacy RDP baseline behavior with
|
|
smoke-only direct-worker trust.
|
|
|
|
This document defines the next security hardening layer around the accepted RDP
|
|
MVP baseline. It does not implement mesh, VPN, server-to-client download, new
|
|
protocol adapters, or another RDP rendering mode.
|
|
|
|
## Current Accepted Historical RDP Baseline
|
|
|
|
- RDP worker baseline: `rap-rdp-worker:rdp-p1-region-order2`
|
|
- Backend control plane remains source of truth.
|
|
- Redis remains live coordination/routing only.
|
|
- Historical direct-worker WSS was the preferred realtime RDP path in this
|
|
stage.
|
|
- Historical backend gateway remained a fallback/debug path for this stage.
|
|
- Text clipboard is policy-gated and accepted.
|
|
- Client-to-server file upload and restricted `RAP_Transfers` visibility are
|
|
accepted.
|
|
|
|
## Problem
|
|
|
|
The current smoke/dev path can still seed RDP target credentials inside
|
|
resource `metadata`. That was acceptable for proving lifecycle and RDP adapter
|
|
behavior, but it must not be the production contract.
|
|
|
|
Production must not rely on plaintext target passwords, usernames, domain
|
|
credentials, client secrets, tokens, or private keys stored in generic resource
|
|
metadata.
|
|
|
|
## Target Secret Model
|
|
|
|
Resources keep non-secret connection shape:
|
|
|
|
```json
|
|
{
|
|
"id": "...",
|
|
"organization_id": "...",
|
|
"protocol": "rdp",
|
|
"address": "rdp.example.internal:3389",
|
|
"secret_ref": "rap-secret://org/<org_id>/resources/<resource_id>/rdp-primary",
|
|
"metadata": {
|
|
"certificate_verification_mode": "strict",
|
|
"render_quality_profile": "balanced"
|
|
}
|
|
}
|
|
```
|
|
|
|
Secrets are stored separately and referenced by `secret_ref`. The secret payload
|
|
is protocol-specific and versioned:
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"protocol": "rdp",
|
|
"username": "...",
|
|
"domain": "...",
|
|
"password": "...",
|
|
"rotation_version": 3
|
|
}
|
|
```
|
|
|
|
The reference, not the plaintext secret, is copied into session metadata and
|
|
audit context.
|
|
|
|
## Runtime Secret Resolution
|
|
|
|
Production runtime should resolve secrets through a dedicated secret resolver:
|
|
|
|
1. Backend validates resource/org/user authorization.
|
|
2. Backend starts the session using resource `secret_ref`.
|
|
3. Worker receives assignment with `secret_ref`, not plaintext credentials.
|
|
4. Worker asks an authorized secret resolver for the secret using:
|
|
- `organization_id`
|
|
- `resource_id`
|
|
- `worker_id`
|
|
- `session_id`
|
|
- short-lived lease/session proof
|
|
5. Secret resolver returns credentials only to authorized workers for active
|
|
leased sessions.
|
|
6. Worker keeps secret material in memory only and never logs it.
|
|
|
|
The current P3.1 MVP uses an encrypted PostgreSQL-backed store:
|
|
|
|
- `resource_secrets` stores ciphertext, nonce, key id, algorithm, version, safe
|
|
metadata, and `payload_sha256`.
|
|
- `SECRET_ENCRYPTION_KEY_B64` or `SECRET_ENCRYPTION_KEY_FILE` supplies the
|
|
AES-256-GCM key.
|
|
- `SECRET_ENCRYPTION_KEY_ID` labels the active key.
|
|
- the API can create/rotate a resource secret, but never returns plaintext.
|
|
- session assignment resolves the secret only after organization/resource/
|
|
worker/session/lease checks.
|
|
|
|
The resolver boundary can later be backed by KMS, Vault, cloud secret managers,
|
|
or node-local secure delivery without changing the resource `secret_ref`
|
|
contract.
|
|
|
|
## Production Guard
|
|
|
|
In `APP_ENV=production`:
|
|
|
|
- RDP/VNC/SSH resources must have `secret_ref`.
|
|
- Plain credential-like keys are rejected in resource `metadata`.
|
|
- Session start rejects legacy resources that still contain plaintext
|
|
credential-like metadata.
|
|
- backend startup requires secret encryption key material.
|
|
- Development/smoke environments may continue using plaintext metadata while
|
|
the resolver path is not used, but this is explicitly not production mode.
|
|
|
|
Credential-like metadata keys include password, username, domain, token,
|
|
private key, client secret, credential, credentials, secret, and common
|
|
underscore/hyphen variants.
|
|
|
|
## Data Plane Trust
|
|
|
|
Already accepted:
|
|
|
|
- backend signs `data_plane_token` with RS256 private key
|
|
- worker validates with public key only
|
|
- token is short-lived
|
|
- token includes session, attachment, user, organization, worker, resource,
|
|
allowed channels, expiry, and jti
|
|
- worker rejects wrong worker, wrong attachment, wrong organization, wrong
|
|
resource, over-broad channels, failed/terminated sessions, and jti replay
|
|
|
|
Production still needed for that stage:
|
|
|
|
- deployed certificate chain for the historical direct-worker WSS path on
|
|
production nodes
|
|
- pinned or platform-issued worker certificates in live production config for
|
|
that historical path
|
|
- no smoke-only TLS bypass in production clients
|
|
- rotation process for data-plane signing keys
|
|
- audit for failed token validation/bind attempts
|
|
|
|
P3.2 historical guard exists:
|
|
|
|
- backend distinguished `smoke_insecure`, `public_ca`, and `platform_ca`
|
|
direct-worker trust modes for the historical RDP path
|
|
- production backend omitted smoke-only direct candidates on that path
|
|
- Windows production client skipped untrusted or smoke-only direct candidates
|
|
|
|
P3.3 historical test-stand smoke exists:
|
|
|
|
- `resource_secrets` migration is applied on `docker-test`
|
|
- backend runs as `APP_ENV=production` with a test-only
|
|
`SECRET_ENCRYPTION_KEY_FILE`
|
|
- a secret-backed RDP resource starts a real session through assignment-time
|
|
secret resolution
|
|
- `resources.metadata`, `remote_sessions.metadata`, and `audit_events` were
|
|
checked for plaintext username/password leakage
|
|
- production backend with `DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE=smoke_insecure`
|
|
returned the historical backend gateway debug path only
|
|
- development/smoke backend with the same trust mode advertises the explicit
|
|
smoke-only historical direct-worker candidate
|
|
- `RAP_Transfers` smoke passed on the secret-backed resource
|
|
|
|
## Required Regression Tests
|
|
|
|
P3 must protect:
|
|
|
|
- plaintext resource credentials rejected in production
|
|
- RDP production resources require `secret_ref`
|
|
- development smoke plaintext metadata remains allowed
|
|
- data-plane allowed channels follow runtime policy
|
|
- direct bind rejects wrong worker
|
|
- direct bind rejects wrong user
|
|
- direct bind rejects wrong organization
|
|
- direct bind rejects wrong resource
|
|
- direct bind rejects old attachment
|
|
- direct bind rejects failed/terminated states
|
|
|
|
## Audit Events
|
|
|
|
Current audit coverage should remain for:
|
|
|
|
- session start
|
|
- attach
|
|
- detach
|
|
- takeover
|
|
- terminate
|
|
- failure
|
|
|
|
Future audit coverage should add:
|
|
|
|
- secret deleted
|
|
- production resource rejected because plaintext credential metadata was found
|
|
|
|
Audit entries must reference `secret_ref` and resource/session ids, never
|
|
plaintext secret values.
|
|
|
|
P3.1 implemented audit events for:
|
|
|
|
- `resource_secret_rotated`
|
|
- `resource_secret_accessed`
|
|
- `resource_secret_access_denied`
|
|
|
|
## Remaining Production Gaps
|
|
|
|
- External KMS/Vault integration is not implemented yet.
|
|
- Master-key rotation/re-encryption workflow is not implemented yet.
|
|
- The worker still receives resolved credentials through the transient
|
|
assignment payload; a future resolver pull/token flow should reduce exposure
|
|
in Redis control queues.
|
|
- Worker still depends on plaintext assignment metadata for development smoke.
|
|
- Production certificate issuance/rotation and platform CA distribution for the
|
|
historical direct-worker path are not complete.
|
|
- The test-stand secret key is a host-local test file, not a production KMS or
|
|
HSM-backed key.
|
|
- Automated end-to-end policy denial coverage is still thin.
|