210 lines
7.1 KiB
Markdown
210 lines
7.1 KiB
Markdown
# Security And Secrets Readiness
|
|
|
|
Status: P3.3 test-stand smoke complete for encrypted resource secrets,
|
|
assignment-time resolution, and production fallback behavior with smoke-only
|
|
direct worker WSS trust.
|
|
|
|
This document defines the next security hardening layer around the accepted RDP
|
|
MVP baseline. It does not implement mesh, VPN, server-to-client download, new
|
|
protocol adapters, or another RDP rendering mode.
|
|
|
|
## Current Accepted Baseline
|
|
|
|
- RDP worker baseline: `rap-rdp-worker:rdp-p1-region-order2`
|
|
- Backend control plane remains source of truth.
|
|
- Redis remains live coordination/routing only.
|
|
- Direct worker WSS is preferred for realtime RDP.
|
|
- Backend gateway remains fallback/debug.
|
|
- Text clipboard is policy-gated and accepted.
|
|
- Client-to-server file upload and restricted `RAP_Transfers` visibility are
|
|
accepted.
|
|
|
|
## Problem
|
|
|
|
The current smoke/dev path can still seed RDP target credentials inside
|
|
resource `metadata`. That was acceptable for proving lifecycle and RDP adapter
|
|
behavior, but it must not be the production contract.
|
|
|
|
Production must not rely on plaintext target passwords, usernames, domain
|
|
credentials, client secrets, tokens, or private keys stored in generic resource
|
|
metadata.
|
|
|
|
## Target Secret Model
|
|
|
|
Resources keep non-secret connection shape:
|
|
|
|
```json
|
|
{
|
|
"id": "...",
|
|
"organization_id": "...",
|
|
"protocol": "rdp",
|
|
"address": "rdp.example.internal:3389",
|
|
"secret_ref": "rap-secret://org/<org_id>/resources/<resource_id>/rdp-primary",
|
|
"metadata": {
|
|
"certificate_verification_mode": "strict",
|
|
"render_quality_profile": "balanced"
|
|
}
|
|
}
|
|
```
|
|
|
|
Secrets are stored separately and referenced by `secret_ref`. The secret payload
|
|
is protocol-specific and versioned:
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"protocol": "rdp",
|
|
"username": "...",
|
|
"domain": "...",
|
|
"password": "...",
|
|
"rotation_version": 3
|
|
}
|
|
```
|
|
|
|
The reference, not the plaintext secret, is copied into session metadata and
|
|
audit context.
|
|
|
|
## Runtime Secret Resolution
|
|
|
|
Production runtime should resolve secrets through a dedicated secret resolver:
|
|
|
|
1. Backend validates resource/org/user authorization.
|
|
2. Backend starts the session using resource `secret_ref`.
|
|
3. Worker receives assignment with `secret_ref`, not plaintext credentials.
|
|
4. Worker asks an authorized secret resolver for the secret using:
|
|
- `organization_id`
|
|
- `resource_id`
|
|
- `worker_id`
|
|
- `session_id`
|
|
- short-lived lease/session proof
|
|
5. Secret resolver returns credentials only to authorized workers for active
|
|
leased sessions.
|
|
6. Worker keeps secret material in memory only and never logs it.
|
|
|
|
The current P3.1 MVP uses an encrypted PostgreSQL-backed store:
|
|
|
|
- `resource_secrets` stores ciphertext, nonce, key id, algorithm, version, safe
|
|
metadata, and `payload_sha256`.
|
|
- `SECRET_ENCRYPTION_KEY_B64` or `SECRET_ENCRYPTION_KEY_FILE` supplies the
|
|
AES-256-GCM key.
|
|
- `SECRET_ENCRYPTION_KEY_ID` labels the active key.
|
|
- the API can create/rotate a resource secret, but never returns plaintext.
|
|
- session assignment resolves the secret only after organization/resource/
|
|
worker/session/lease checks.
|
|
|
|
The resolver boundary can later be backed by KMS, Vault, cloud secret managers,
|
|
or node-local secure delivery without changing the resource `secret_ref`
|
|
contract.
|
|
|
|
## Production Guard
|
|
|
|
In `APP_ENV=production`:
|
|
|
|
- RDP/VNC/SSH resources must have `secret_ref`.
|
|
- Plain credential-like keys are rejected in resource `metadata`.
|
|
- Session start rejects legacy resources that still contain plaintext
|
|
credential-like metadata.
|
|
- backend startup requires secret encryption key material.
|
|
- Development/smoke environments may continue using plaintext metadata while
|
|
the resolver path is not used, but this is explicitly not production mode.
|
|
|
|
Credential-like metadata keys include password, username, domain, token,
|
|
private key, client secret, credential, credentials, secret, and common
|
|
underscore/hyphen variants.
|
|
|
|
## Data Plane Trust
|
|
|
|
Already accepted:
|
|
|
|
- backend signs `data_plane_token` with RS256 private key
|
|
- worker validates with public key only
|
|
- token is short-lived
|
|
- token includes session, attachment, user, organization, worker, resource,
|
|
allowed channels, expiry, and jti
|
|
- worker rejects wrong worker, wrong attachment, wrong organization, wrong
|
|
resource, over-broad channels, failed/terminated sessions, and jti replay
|
|
|
|
Production still needs:
|
|
|
|
- deployed certificate chain for direct worker WSS on production nodes
|
|
- pinned or platform-issued worker certificates in live production config
|
|
- no smoke-only TLS bypass in production clients
|
|
- rotation process for data-plane signing keys
|
|
- audit for failed token validation/bind attempts
|
|
|
|
P3.2 guard exists:
|
|
|
|
- backend distinguishes `smoke_insecure`, `public_ca`, and `platform_ca`
|
|
direct worker WSS trust modes
|
|
- production backend omits smoke-only direct candidates
|
|
- Windows production client skips untrusted or smoke-only direct candidates
|
|
|
|
P3.3 test-stand smoke exists:
|
|
|
|
- `resource_secrets` migration is applied on `docker-test`
|
|
- backend runs as `APP_ENV=production` with a test-only
|
|
`SECRET_ENCRYPTION_KEY_FILE`
|
|
- a secret-backed RDP resource starts a real session through assignment-time
|
|
secret resolution
|
|
- `resources.metadata`, `remote_sessions.metadata`, and `audit_events` were
|
|
checked for plaintext username/password leakage
|
|
- production backend with `DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE=smoke_insecure`
|
|
returns backend gateway fallback only
|
|
- development/smoke backend with the same trust mode advertises the explicit
|
|
smoke-only direct worker WSS candidate
|
|
- `RAP_Transfers` smoke passed on the secret-backed resource
|
|
|
|
## Required Regression Tests
|
|
|
|
P3 must protect:
|
|
|
|
- plaintext resource credentials rejected in production
|
|
- RDP production resources require `secret_ref`
|
|
- development smoke plaintext metadata remains allowed
|
|
- data-plane allowed channels follow runtime policy
|
|
- direct bind rejects wrong worker
|
|
- direct bind rejects wrong user
|
|
- direct bind rejects wrong organization
|
|
- direct bind rejects wrong resource
|
|
- direct bind rejects old attachment
|
|
- direct bind rejects failed/terminated states
|
|
|
|
## Audit Events
|
|
|
|
Current audit coverage should remain for:
|
|
|
|
- session start
|
|
- attach
|
|
- detach
|
|
- takeover
|
|
- terminate
|
|
- failure
|
|
|
|
Future audit coverage should add:
|
|
|
|
- secret deleted
|
|
- production resource rejected because plaintext credential metadata was found
|
|
|
|
Audit entries must reference `secret_ref` and resource/session ids, never
|
|
plaintext secret values.
|
|
|
|
P3.1 implemented audit events for:
|
|
|
|
- `resource_secret_rotated`
|
|
- `resource_secret_accessed`
|
|
- `resource_secret_access_denied`
|
|
|
|
## Remaining Production Gaps
|
|
|
|
- External KMS/Vault integration is not implemented yet.
|
|
- Master-key rotation/re-encryption workflow is not implemented yet.
|
|
- The worker still receives resolved credentials through the transient
|
|
assignment payload; a future resolver pull/token flow should reduce exposure
|
|
in Redis control queues.
|
|
- Worker still depends on plaintext assignment metadata for development smoke.
|
|
- Production direct worker WSS certificate issuance/rotation and platform CA
|
|
distribution are not complete.
|
|
- The test-stand secret key is a host-local test file, not a production KMS or
|
|
HSM-backed key.
|
|
- Automated end-to-end policy denial coverage is still thin.
|