271 lines
18 KiB
Markdown
271 lines
18 KiB
Markdown
# Backend Foundation
|
|
|
|
Production-oriented Go backend skeleton for the remote access platform.
|
|
|
|
## Scope included
|
|
|
|
- configuration loading from environment
|
|
- HTTP server bootstrap with graceful shutdown
|
|
- PostgreSQL and Redis connectivity wiring
|
|
- migrations scaffold
|
|
- auth foundation with access/refresh tokens, hashed refresh rotation, trusted devices, and persisted auth sessions
|
|
- persistent session storage foundation for remote sessions, attachments, resource policies, and audit events
|
|
- session broker orchestration for start, attach, detach, takeover, terminate, failure, and detached-session recovery
|
|
- Redis-backed live session state, controller binding, attach tokens, heartbeat keys, worker routing, and reconnect support
|
|
- Redis-backed worker registration, lease lifecycle, heartbeat tracking, stale lease recovery, and routing queues
|
|
- worker assignment queueing and worker event ingestion for the minimal real RDP worker runtime
|
|
- websocket live plane with attach handshake, ping/pong heartbeat, state messages, takeover detection, and transport reconnect flow
|
|
- module boundaries for auth, resources, session broker, and websocket gateway
|
|
- worker registry scaffold to prepare later RDP worker integration
|
|
- per-resource certificate verification policy for RDP connections with `strict` default and explicit `ignore` override
|
|
- platform-core v2 foundations for organizations, memberships, identity sources, nodes, and node-agent control plane
|
|
- Data Plane v1 contract scaffolding for optional session response candidates/tokens, with current backend gateway behavior preserved as fallback
|
|
- production resource secret-readiness guard for rejecting plaintext credential-like metadata and requiring `secret_ref` for RDP/VNC/SSH resources in production mode
|
|
- encrypted resource secret storage/resolver MVP for production `secret_ref` usage
|
|
|
|
## Entry point
|
|
|
|
Run the API from `cmd/api`.
|
|
|
|
## Local dev
|
|
|
|
- backend: `pwsh -File scripts/smoke/run-backend.ps1`
|
|
- infra: `pwsh -File scripts/smoke/start-infra.ps1`
|
|
- migrations: `pwsh -File scripts/smoke/apply-migrations.ps1`
|
|
- worker image build: `docker build --tag rap-rdp-worker:dev --file workers/rdp-worker/Dockerfile workers/rdp-worker`
|
|
- end-to-end smoke path: [scripts/smoke/README.md](/\\?\UNC\192.168.220.200\mst\codex\rdp-proxy\scripts\smoke\README.md)
|
|
|
|
## Configuration
|
|
|
|
Use `configs/api.example.env` as the starting point for local environment variables.
|
|
|
|
Resource secret-readiness is controlled by `APP_ENV`:
|
|
|
|
- in `APP_ENV=production` or `APP_ENV=prod`, RDP/VNC/SSH resources must carry
|
|
`secret_ref` and must not include plaintext credential-like fields in
|
|
`metadata`
|
|
- in development and smoke environments, plaintext metadata remains allowed
|
|
until the encrypted secret resolver is implemented
|
|
- the production guard is enforced both on resource create/update and on
|
|
session start, so legacy plaintext resources cannot be started in production
|
|
accidentally
|
|
- `SECRET_ENCRYPTION_KEY_B64` or `SECRET_ENCRYPTION_KEY_FILE` supplies the
|
|
AES-256-GCM master key for the MVP encrypted store; production mode refuses
|
|
to start without one
|
|
- `SECRET_ENCRYPTION_KEY_ID` labels the active key version in stored records
|
|
- `PUT /api/v1/resources/{resourceID}/secret` creates or rotates a resource
|
|
secret and updates `resources.secret_ref`; plaintext is never returned by the
|
|
API
|
|
- session assignment keeps PostgreSQL metadata safe: `remote_sessions.metadata`
|
|
stores `secret_ref`, while resolved credentials are merged only into the
|
|
transient worker assignment after session/worker/lease checks
|
|
|
|
See `docs/architecture/SECURITY_SECRETS_READINESS.md` for the target
|
|
secret-reference model and remaining resolver/PKI gaps.
|
|
|
|
Data Plane v1 contract scaffolding is controlled by:
|
|
|
|
- `DATA_PLANE_TOKEN_TTL`, default `1m`
|
|
- `DATA_PLANE_TOKEN_PRIVATE_KEY_FILE`, optional path to an RSA private key PEM used to sign RS256 data-plane tokens
|
|
- `DATA_PLANE_TOKEN_PRIVATE_KEY_PEM`, optional inline RSA private key PEM; used when file path is not configured
|
|
- `DATA_PLANE_BACKEND_GATEWAY_URL`, default `/api/v1/gateway/ws`
|
|
- `DATA_PLANE_DIRECT_WORKER_WSS_URL_TEMPLATE`, optional; supports `{worker_id}` replacement
|
|
- `DATA_PLANE_DIRECT_WORKER_JSON_RUNTIME`, default `false`; advertises
|
|
`runtime_transport=json_v1` only after the worker direct JSON bridge is
|
|
deployed and verified
|
|
- `DATA_PLANE_DIRECT_WORKER_BINARY_RENDER`, default `false`; when the direct
|
|
JSON runtime is enabled, advertises `render_transport=binary_v1` so DP-2
|
|
clients can request binary render frames over direct worker WSS. Binary
|
|
render candidates also advertise `supported_color_modes=["full_color","grayscale"]`
|
|
and `default_color_mode="full_color"` for the DP-3A grayscale foundation.
|
|
- `DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE`, default `smoke_insecure`; allowed
|
|
values are `smoke_insecure`, `public_ca`, and `platform_ca`.
|
|
- `DATA_PLANE_DIRECT_WORKER_TLS_CA_REF`, optional label for the platform CA or
|
|
trust bundle version advertised to clients.
|
|
|
|
Data-plane tokens are RS256-signed. The backend must hold only the private key;
|
|
workers receive only the matching public key for validation. If no private key
|
|
is configured, the backend omits the optional `data_plane` offer and the
|
|
backend gateway fallback remains unchanged.
|
|
|
|
If no direct worker WSS URL template is configured, session responses still include the backend gateway fallback candidate only.
|
|
If the URL template is configured but `DATA_PLANE_DIRECT_WORKER_JSON_RUNTIME`
|
|
is `false`, the direct candidate is still present for contract visibility but is
|
|
not marked data-capable; DP-1D Windows clients will skip it and use the backend
|
|
gateway fallback.
|
|
If `DATA_PLANE_DIRECT_WORKER_BINARY_RENDER` is `false`, direct worker WSS
|
|
remains JSON/base64 for render. If it is `true`, only direct worker WSS render
|
|
is binary; backend gateway fallback remains JSON/base64.
|
|
In production, the backend does not advertise direct worker WSS when
|
|
`DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE=smoke_insecure`; it keeps the backend
|
|
gateway fallback instead. Trusted direct candidates include `tls_trust_mode`,
|
|
`production_trusted`, `smoke_only`, and optional `tls_ca_ref` metadata. See
|
|
`docs/architecture/DIRECT_WORKER_TLS_PKI.md`.
|
|
|
|
## Module layout
|
|
|
|
- `internal/platform` shared runtime, config, infra, and bootstrap concerns
|
|
- `internal/modules/auth` auth and trusted-device boundary
|
|
- `internal/modules/organization` organization model, org roles, and memberships
|
|
- `internal/modules/identitysource` local/LDAP/OIDC identity source model and future mapping foundations
|
|
- `internal/modules/resource` remote resource inventory boundary
|
|
- `internal/modules/sessionbroker` persistent session lifecycle, orchestration, audit, and Redis live-state boundary
|
|
- `internal/modules/sessiongateway` websocket attach/reconnect/takeover transport boundary
|
|
- `internal/modules/worker` worker registration, lease coordination, and control-plane routing boundary for future C++ RDP workers
|
|
- `internal/modules/node` node inventory, capabilities, enabled services, update policy, and partition state
|
|
- `internal/modules/nodeagent` node-agent registration, health, service status, and update/rollback control interface
|
|
- `pkg/contracts` cross-module contracts for sessions and worker control
|
|
|
|
## Backend responsibilities
|
|
|
|
- PostgreSQL remains the source of truth for auth sessions, devices, remote sessions, attachments, resource policies, and audit events
|
|
- Redis is used only for live routing and coordination: attach tokens, controller bindings, live session cache, worker registration, worker leases, heartbeats, and routing queues
|
|
- `worker:control:<worker_id>` carries worker assignments, `worker:queue:<session_id>` carries live control/input envelopes, and `worker:events` carries worker-reported lifecycle events back into broker processing
|
|
- Session broker owns state transitions and orchestration rules; websocket handlers call broker services instead of talking to postgres repositories directly
|
|
- Worker runtime stays behind interfaces and Redis coordination so the backend remains isolated from FreeRDP implementation details while the minimal real RDP worker plugs into the control plane
|
|
- RDP certificate verification is configured per resource through `certificate_verification_mode`
|
|
- resources are now org-scoped in PostgreSQL and remote sessions persist their owning organization without changing the proven worker/session runtime contracts
|
|
- session start/attach/takeover responses may include optional `data_plane` candidates and a short-lived signed data-plane token for DP-1 direct worker WSS migration; existing clients continue to use the current gateway path, and direct realtime use remains gated by explicit candidate metadata
|
|
|
|
## Authorization model
|
|
|
|
- `platform_admin` and `platform_recovery_admin` have global access across organizations, resources, and sessions
|
|
- in `INSTALLATION_AUTHORITY_MODE=strict`, platform-admin power is effective only
|
|
when the user also has a valid signed row in `platform_role_grants`; changing
|
|
`users.platform_role` in PostgreSQL alone no longer grants owner access
|
|
- first-owner bootstrap is available at
|
|
`POST /api/v1/installation/bootstrap-owner` and requires a Product Root
|
|
Ed25519 signature over an activation manifest in strict mode
|
|
- production (`APP_ENV=production` or `prod`) requires strict installation
|
|
authority plus `INSTALLATION_PRODUCT_ROOT_PUBLIC_KEY_B64` or
|
|
`INSTALLATION_PRODUCT_ROOT_PUBLIC_KEY_FILE`
|
|
- legacy/dev installs can keep database-role behavior, and insecure first-owner
|
|
bootstrap is available only when
|
|
`INSTALLATION_INSECURE_BOOTSTRAP_ENABLED=true`
|
|
- `org_owner` and `org_admin` can create and update resources inside their organization and can manage any remote session inside that organization
|
|
- active non-admin memberships such as `org_operator`, `org_member`, and `org_viewer` are deny-by-default for admin actions; they can only access org-scoped reads and operate on their own session flows where the session broker explicitly allows it
|
|
- session start always authorizes the actor against the resource organization before worker reservation
|
|
- attach, detach, takeover, and terminate authorize against the owning remote session organization before any state transition is written
|
|
- worker-facing events do not bypass this model for user-originated commands; internal worker failure and heartbeat paths remain broker-internal control-plane operations
|
|
|
|
## Migration safety
|
|
|
|
- `000005_platform_core_v2` bootstraps a single `default` organization and backfills existing `resources.organization_id` and `remote_sessions.organization_id` into that organization before setting `NOT NULL`
|
|
- `000006_default_org_memberships_backfill` safely restores access continuity by inserting missing active memberships for existing users into the `default` organization
|
|
- the backfill is idempotent because it only inserts rows missing under the `(organization_id, user_id)` uniqueness constraint
|
|
- platform administrators are backfilled as `org_owner` in the default organization, while other existing users are backfilled as `org_member`
|
|
- if `000005` fails before the `NOT NULL` step, PostgreSQL rolls back the transaction and leaves pre-v2 rows untouched; if `000006` is rerun, it skips already-created memberships rather than duplicating them
|
|
|
|
## Platform-Core V2 Notes
|
|
|
|
- `organizations`, `organization_memberships`, and `organization_roles` establish multi-tenant ownership and basic org-scoped authorization boundaries
|
|
- `identity_sources` and `identity_mappings` are foundation-only in this phase; full LDAP/OIDC sync and claim/group ingestion are intentionally deferred
|
|
- `nodes`, `node_capabilities`, `node_services`, `node_update_policies`, `node_partition_states`, and `node_agent_update_runs` provide the first control-plane model for node and node-agent lifecycle
|
|
- current proven RDP session lifecycle remains preserved: the session broker still orchestrates the same worker/session behavior, but it now records organization ownership via org-scoped resources
|
|
- PostgreSQL remains the source of truth for organizations, memberships, org-scoped resources, identity sources, nodes, node-agent state, and session lifecycle state
|
|
|
|
## Resource Certificate Verification
|
|
|
|
- `strict` is the default and keeps normal certificate validation enabled in the worker runtime
|
|
- `ignore` must be explicitly stored on the resource and allows that one RDP connection to skip certificate validation
|
|
- the backend passes this policy through session assignment data; it is not a global backend toggle
|
|
|
|
## Messaging Model
|
|
|
|
- HTTP errors now use a structured envelope:
|
|
- `error.code`
|
|
- `error.message_key`
|
|
- `error.fallback_message`
|
|
- `error.details`
|
|
- `error.trace_id`
|
|
- `internal/platform/httpx` owns error normalization and trace-id generation so handlers can keep calling `WriteError(...)` without changing business logic.
|
|
- For `5xx` responses, user-facing payloads are normalized to an English generic fallback message while logs and diagnostics can still keep raw internal details elsewhere.
|
|
- For `4xx` responses, stable `code` and `message_key` are derived from the current fallback message, so clients can localize without depending on raw English text as the primary contract.
|
|
|
|
## WebSocket Messaging
|
|
|
|
- Session gateway envelopes keep the existing `type` and `payload` contract.
|
|
- User-facing websocket events now also include `event` with:
|
|
- `code`
|
|
- `message_key`
|
|
- `fallback_message`
|
|
- `details`
|
|
- `trace_id`
|
|
- `session.taken_over`, terminal `session.state`, `transport.closed`, and protocol-level errors now carry this structured event object.
|
|
- Existing payload semantics remain intact for compatibility with the already proven session lifecycle.
|
|
|
|
## Message Rules
|
|
|
|
- Keep English as the only development language for `fallback_message`, logs, and diagnostics.
|
|
- New HTTP handlers should prefer `httpx.WriteError(...)` for user-facing failures instead of hand-building `"error": "..."` JSON.
|
|
- New websocket user-facing notifications should populate `TransportEnvelope.Event` with a stable `code` and `message_key`.
|
|
- Do not use raw human-readable English text as the primary client contract; it should only remain as fallback text.
|
|
- This messaging layer is now runtime-proven against the live Windows smoke flow for invalid-login errors, websocket takeover delivery, websocket state fallback rendering, and worker-death failure handling.
|
|
## Clipboard Policy
|
|
|
|
RDP text clipboard is controlled per resource through `resource_policies.clipboard_mode`.
|
|
Allowed values are `disabled`, `client_to_server`, `server_to_client`, and
|
|
`bidirectional`; the default is `disabled`. The legacy `clipboard_enabled`
|
|
column is retained only for compatibility and migration/backfill, while new
|
|
runtime decisions use `clipboard_mode`.
|
|
|
|
Clipboard enforcement happens in the real data path:
|
|
|
|
- `sessionbroker.ResourcePolicy.ClipboardMode` is loaded from PostgreSQL and
|
|
embedded into the session assignment metadata sent to the worker.
|
|
- `sessiongateway.Module.handleEnvelope` blocks client-to-server clipboard
|
|
envelopes unless the session is `active` and the policy allows that direction.
|
|
- `worker.EventProcessor` sends worker-originated clipboard text through
|
|
`sessionbroker.Service.UpdateWorkerClipboardText`, which applies the same
|
|
active-state and server-to-client policy checks before updating live state.
|
|
- Clipboard messages carry `sequence_id`, `origin`, and `content_hash` so
|
|
clients and workers can avoid feedback loops across reattach/takeover paths.
|
|
- Redis stores clipboard text only as transient live state for routing to the
|
|
active controller; PostgreSQL remains authoritative for policy/session state.
|
|
|
|
## File Upload Policy
|
|
|
|
Stage 5.1 introduces client-to-server file upload as a policy-gated RDP
|
|
feature. The authoritative policy field is
|
|
`resource_policies.file_transfer_mode`; allowed values are `disabled`,
|
|
`client_to_server`, `server_to_client`, and `bidirectional`, but only
|
|
`client_to_server` behavior is implemented in this stage. The default is
|
|
`disabled`. The legacy `file_transfer_enabled` column is retained only as a
|
|
derived compatibility flag and must not be treated as the primary policy.
|
|
|
|
Enforcement is deliberately duplicated in the real data path:
|
|
|
|
- `resource.Module` exposes `file_transfer_mode` in resource create, update,
|
|
list, and read payloads.
|
|
- `sessionbroker.Service.StartRemoteSession` embeds `file_transfer_mode` into
|
|
assignment metadata and requests the worker `file-transfer` capability only
|
|
when client-to-server upload is allowed.
|
|
- `sessiongateway.Module.handleFileUploadStart` and
|
|
`handleFileUploadChunk` require an active session, current controller,
|
|
allowed policy mode, valid UUID `transfer_id`, safe file name, 25 MiB max
|
|
file size, and 256 KiB max chunk size before routing chunks to the worker.
|
|
- Redis is used only to route bounded upload envelopes to the worker. The file
|
|
itself is written by the worker to controlled worker storage; PostgreSQL
|
|
remains authoritative for policy and session state.
|
|
|
|
## File Download Policy
|
|
|
|
Stage 5.2 adds a runtime-proven server-to-client download path for RDP. The
|
|
policy field remains `resource_policies.file_transfer_mode`; `server_to_client`
|
|
and `bidirectional` allow download, while `disabled` and `client_to_server`
|
|
block it. The default remains `disabled`.
|
|
|
|
The v1 download model uses only the restricted `RAP_Transfers\ToClient`
|
|
drop-zone inside the existing per-session visible transfer directory. Backend
|
|
gateway accepts only `file_download.start`, `file_download.ack`, and
|
|
`file_download.cancel` from the current controller of an active session and
|
|
routes them to the worker after policy validation. Worker-origin
|
|
`file_download.*` events are stored only as transient live state for
|
|
backend-gateway fallback delivery; PostgreSQL remains authoritative for
|
|
session/resource/policy state and must not store file contents.
|
|
|
|
The direct worker WSS path is also lifecycle-gated: detach returns
|
|
`file_download.blocked`, old-controller takeover returns `session.taken_over`,
|
|
and worker failure closes the direct transport after PostgreSQL transitions the
|
|
session to `failed`.
|