Files
rdp-proxy/backend/README.md
T
2026-04-28 22:29:50 +03:00

18 KiB

Backend Foundation

Production-oriented Go backend skeleton for the remote access platform.

Scope included

  • configuration loading from environment
  • HTTP server bootstrap with graceful shutdown
  • PostgreSQL and Redis connectivity wiring
  • migrations scaffold
  • auth foundation with access/refresh tokens, hashed refresh rotation, trusted devices, and persisted auth sessions
  • persistent session storage foundation for remote sessions, attachments, resource policies, and audit events
  • session broker orchestration for start, attach, detach, takeover, terminate, failure, and detached-session recovery
  • Redis-backed live session state, controller binding, attach tokens, heartbeat keys, worker routing, and reconnect support
  • Redis-backed worker registration, lease lifecycle, heartbeat tracking, stale lease recovery, and routing queues
  • worker assignment queueing and worker event ingestion for the minimal real RDP worker runtime
  • websocket live plane with attach handshake, ping/pong heartbeat, state messages, takeover detection, and transport reconnect flow
  • module boundaries for auth, resources, session broker, and websocket gateway
  • worker registry scaffold to prepare later RDP worker integration
  • per-resource certificate verification policy for RDP connections with strict default and explicit ignore override
  • platform-core v2 foundations for organizations, memberships, identity sources, nodes, and node-agent control plane
  • Data Plane v1 contract scaffolding for optional session response candidates/tokens, with current backend gateway behavior preserved as fallback
  • production resource secret-readiness guard for rejecting plaintext credential-like metadata and requiring secret_ref for RDP/VNC/SSH resources in production mode
  • encrypted resource secret storage/resolver MVP for production secret_ref usage

Entry point

Run the API from cmd/api.

Local dev

  • backend: pwsh -File scripts/smoke/run-backend.ps1
  • infra: pwsh -File scripts/smoke/start-infra.ps1
  • migrations: pwsh -File scripts/smoke/apply-migrations.ps1
  • worker image build: docker build --tag rap-rdp-worker:dev --file workers/rdp-worker/Dockerfile workers/rdp-worker
  • end-to-end smoke path: scripts/smoke/README.md

Configuration

Use configs/api.example.env as the starting point for local environment variables.

Resource secret-readiness is controlled by APP_ENV:

  • in APP_ENV=production or APP_ENV=prod, RDP/VNC/SSH resources must carry secret_ref and must not include plaintext credential-like fields in metadata
  • in development and smoke environments, plaintext metadata remains allowed until the encrypted secret resolver is implemented
  • the production guard is enforced both on resource create/update and on session start, so legacy plaintext resources cannot be started in production accidentally
  • SECRET_ENCRYPTION_KEY_B64 or SECRET_ENCRYPTION_KEY_FILE supplies the AES-256-GCM master key for the MVP encrypted store; production mode refuses to start without one
  • SECRET_ENCRYPTION_KEY_ID labels the active key version in stored records
  • PUT /api/v1/resources/{resourceID}/secret creates or rotates a resource secret and updates resources.secret_ref; plaintext is never returned by the API
  • session assignment keeps PostgreSQL metadata safe: remote_sessions.metadata stores secret_ref, while resolved credentials are merged only into the transient worker assignment after session/worker/lease checks

See docs/architecture/SECURITY_SECRETS_READINESS.md for the target secret-reference model and remaining resolver/PKI gaps.

Data Plane v1 contract scaffolding is controlled by:

  • DATA_PLANE_TOKEN_TTL, default 1m
  • DATA_PLANE_TOKEN_PRIVATE_KEY_FILE, optional path to an RSA private key PEM used to sign RS256 data-plane tokens
  • DATA_PLANE_TOKEN_PRIVATE_KEY_PEM, optional inline RSA private key PEM; used when file path is not configured
  • DATA_PLANE_BACKEND_GATEWAY_URL, default /api/v1/gateway/ws
  • DATA_PLANE_DIRECT_WORKER_WSS_URL_TEMPLATE, optional; supports {worker_id} replacement
  • DATA_PLANE_DIRECT_WORKER_JSON_RUNTIME, default false; advertises runtime_transport=json_v1 only after the worker direct JSON bridge is deployed and verified
  • DATA_PLANE_DIRECT_WORKER_BINARY_RENDER, default false; when the direct JSON runtime is enabled, advertises render_transport=binary_v1 so DP-2 clients can request binary render frames over direct worker WSS. Binary render candidates also advertise supported_color_modes=["full_color","grayscale"] and default_color_mode="full_color" for the DP-3A grayscale foundation.
  • DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE, default smoke_insecure; allowed values are smoke_insecure, public_ca, and platform_ca.
  • DATA_PLANE_DIRECT_WORKER_TLS_CA_REF, optional label for the platform CA or trust bundle version advertised to clients.

Data-plane tokens are RS256-signed. The backend must hold only the private key; workers receive only the matching public key for validation. If no private key is configured, the backend omits the optional data_plane offer and the backend gateway fallback remains unchanged.

If no direct worker WSS URL template is configured, session responses still include the backend gateway fallback candidate only. If the URL template is configured but DATA_PLANE_DIRECT_WORKER_JSON_RUNTIME is false, the direct candidate is still present for contract visibility but is not marked data-capable; DP-1D Windows clients will skip it and use the backend gateway fallback. If DATA_PLANE_DIRECT_WORKER_BINARY_RENDER is false, direct worker WSS remains JSON/base64 for render. If it is true, only direct worker WSS render is binary; backend gateway fallback remains JSON/base64. In production, the backend does not advertise direct worker WSS when DATA_PLANE_DIRECT_WORKER_TLS_TRUST_MODE=smoke_insecure; it keeps the backend gateway fallback instead. Trusted direct candidates include tls_trust_mode, production_trusted, smoke_only, and optional tls_ca_ref metadata. See docs/architecture/DIRECT_WORKER_TLS_PKI.md.

Module layout

  • internal/platform shared runtime, config, infra, and bootstrap concerns
  • internal/modules/auth auth and trusted-device boundary
  • internal/modules/organization organization model, org roles, and memberships
  • internal/modules/identitysource local/LDAP/OIDC identity source model and future mapping foundations
  • internal/modules/resource remote resource inventory boundary
  • internal/modules/sessionbroker persistent session lifecycle, orchestration, audit, and Redis live-state boundary
  • internal/modules/sessiongateway websocket attach/reconnect/takeover transport boundary
  • internal/modules/worker worker registration, lease coordination, and control-plane routing boundary for future C++ RDP workers
  • internal/modules/node node inventory, capabilities, enabled services, update policy, and partition state
  • internal/modules/nodeagent node-agent registration, health, service status, and update/rollback control interface
  • pkg/contracts cross-module contracts for sessions and worker control

Backend responsibilities

  • PostgreSQL remains the source of truth for auth sessions, devices, remote sessions, attachments, resource policies, and audit events
  • Redis is used only for live routing and coordination: attach tokens, controller bindings, live session cache, worker registration, worker leases, heartbeats, and routing queues
  • worker:control:<worker_id> carries worker assignments, worker:queue:<session_id> carries live control/input envelopes, and worker:events carries worker-reported lifecycle events back into broker processing
  • Session broker owns state transitions and orchestration rules; websocket handlers call broker services instead of talking to postgres repositories directly
  • Worker runtime stays behind interfaces and Redis coordination so the backend remains isolated from FreeRDP implementation details while the minimal real RDP worker plugs into the control plane
  • RDP certificate verification is configured per resource through certificate_verification_mode
  • resources are now org-scoped in PostgreSQL and remote sessions persist their owning organization without changing the proven worker/session runtime contracts
  • session start/attach/takeover responses may include optional data_plane candidates and a short-lived signed data-plane token for DP-1 direct worker WSS migration; existing clients continue to use the current gateway path, and direct realtime use remains gated by explicit candidate metadata

Authorization model

  • platform_admin and platform_recovery_admin have global access across organizations, resources, and sessions
  • in INSTALLATION_AUTHORITY_MODE=strict, platform-admin power is effective only when the user also has a valid signed row in platform_role_grants; changing users.platform_role in PostgreSQL alone no longer grants owner access
  • first-owner bootstrap is available at POST /api/v1/installation/bootstrap-owner and requires a Product Root Ed25519 signature over an activation manifest in strict mode
  • production (APP_ENV=production or prod) requires strict installation authority plus INSTALLATION_PRODUCT_ROOT_PUBLIC_KEY_B64 or INSTALLATION_PRODUCT_ROOT_PUBLIC_KEY_FILE
  • legacy/dev installs can keep database-role behavior, and insecure first-owner bootstrap is available only when INSTALLATION_INSECURE_BOOTSTRAP_ENABLED=true
  • org_owner and org_admin can create and update resources inside their organization and can manage any remote session inside that organization
  • active non-admin memberships such as org_operator, org_member, and org_viewer are deny-by-default for admin actions; they can only access org-scoped reads and operate on their own session flows where the session broker explicitly allows it
  • session start always authorizes the actor against the resource organization before worker reservation
  • attach, detach, takeover, and terminate authorize against the owning remote session organization before any state transition is written
  • worker-facing events do not bypass this model for user-originated commands; internal worker failure and heartbeat paths remain broker-internal control-plane operations

Migration safety

  • 000005_platform_core_v2 bootstraps a single default organization and backfills existing resources.organization_id and remote_sessions.organization_id into that organization before setting NOT NULL
  • 000006_default_org_memberships_backfill safely restores access continuity by inserting missing active memberships for existing users into the default organization
  • the backfill is idempotent because it only inserts rows missing under the (organization_id, user_id) uniqueness constraint
  • platform administrators are backfilled as org_owner in the default organization, while other existing users are backfilled as org_member
  • if 000005 fails before the NOT NULL step, PostgreSQL rolls back the transaction and leaves pre-v2 rows untouched; if 000006 is rerun, it skips already-created memberships rather than duplicating them

Platform-Core V2 Notes

  • organizations, organization_memberships, and organization_roles establish multi-tenant ownership and basic org-scoped authorization boundaries
  • identity_sources and identity_mappings are foundation-only in this phase; full LDAP/OIDC sync and claim/group ingestion are intentionally deferred
  • nodes, node_capabilities, node_services, node_update_policies, node_partition_states, and node_agent_update_runs provide the first control-plane model for node and node-agent lifecycle
  • current proven RDP session lifecycle remains preserved: the session broker still orchestrates the same worker/session behavior, but it now records organization ownership via org-scoped resources
  • PostgreSQL remains the source of truth for organizations, memberships, org-scoped resources, identity sources, nodes, node-agent state, and session lifecycle state

Resource Certificate Verification

  • strict is the default and keeps normal certificate validation enabled in the worker runtime
  • ignore must be explicitly stored on the resource and allows that one RDP connection to skip certificate validation
  • the backend passes this policy through session assignment data; it is not a global backend toggle

Messaging Model

  • HTTP errors now use a structured envelope:
    • error.code
    • error.message_key
    • error.fallback_message
    • error.details
    • error.trace_id
  • internal/platform/httpx owns error normalization and trace-id generation so handlers can keep calling WriteError(...) without changing business logic.
  • For 5xx responses, user-facing payloads are normalized to an English generic fallback message while logs and diagnostics can still keep raw internal details elsewhere.
  • For 4xx responses, stable code and message_key are derived from the current fallback message, so clients can localize without depending on raw English text as the primary contract.

WebSocket Messaging

  • Session gateway envelopes keep the existing type and payload contract.
  • User-facing websocket events now also include event with:
    • code
    • message_key
    • fallback_message
    • details
    • trace_id
  • session.taken_over, terminal session.state, transport.closed, and protocol-level errors now carry this structured event object.
  • Existing payload semantics remain intact for compatibility with the already proven session lifecycle.

Message Rules

  • Keep English as the only development language for fallback_message, logs, and diagnostics.
  • New HTTP handlers should prefer httpx.WriteError(...) for user-facing failures instead of hand-building "error": "..." JSON.
  • New websocket user-facing notifications should populate TransportEnvelope.Event with a stable code and message_key.
  • Do not use raw human-readable English text as the primary client contract; it should only remain as fallback text.
  • This messaging layer is now runtime-proven against the live Windows smoke flow for invalid-login errors, websocket takeover delivery, websocket state fallback rendering, and worker-death failure handling.

Clipboard Policy

RDP text clipboard is controlled per resource through resource_policies.clipboard_mode. Allowed values are disabled, client_to_server, server_to_client, and bidirectional; the default is disabled. The legacy clipboard_enabled column is retained only for compatibility and migration/backfill, while new runtime decisions use clipboard_mode.

Clipboard enforcement happens in the real data path:

  • sessionbroker.ResourcePolicy.ClipboardMode is loaded from PostgreSQL and embedded into the session assignment metadata sent to the worker.
  • sessiongateway.Module.handleEnvelope blocks client-to-server clipboard envelopes unless the session is active and the policy allows that direction.
  • worker.EventProcessor sends worker-originated clipboard text through sessionbroker.Service.UpdateWorkerClipboardText, which applies the same active-state and server-to-client policy checks before updating live state.
  • Clipboard messages carry sequence_id, origin, and content_hash so clients and workers can avoid feedback loops across reattach/takeover paths.
  • Redis stores clipboard text only as transient live state for routing to the active controller; PostgreSQL remains authoritative for policy/session state.

File Upload Policy

Stage 5.1 introduces client-to-server file upload as a policy-gated RDP feature. The authoritative policy field is resource_policies.file_transfer_mode; allowed values are disabled, client_to_server, server_to_client, and bidirectional, but only client_to_server behavior is implemented in this stage. The default is disabled. The legacy file_transfer_enabled column is retained only as a derived compatibility flag and must not be treated as the primary policy.

Enforcement is deliberately duplicated in the real data path:

  • resource.Module exposes file_transfer_mode in resource create, update, list, and read payloads.
  • sessionbroker.Service.StartRemoteSession embeds file_transfer_mode into assignment metadata and requests the worker file-transfer capability only when client-to-server upload is allowed.
  • sessiongateway.Module.handleFileUploadStart and handleFileUploadChunk require an active session, current controller, allowed policy mode, valid UUID transfer_id, safe file name, 25 MiB max file size, and 256 KiB max chunk size before routing chunks to the worker.
  • Redis is used only to route bounded upload envelopes to the worker. The file itself is written by the worker to controlled worker storage; PostgreSQL remains authoritative for policy and session state.

File Download Policy

Stage 5.2 adds a runtime-proven server-to-client download path for RDP. The policy field remains resource_policies.file_transfer_mode; server_to_client and bidirectional allow download, while disabled and client_to_server block it. The default remains disabled.

The v1 download model uses only the restricted RAP_Transfers\ToClient drop-zone inside the existing per-session visible transfer directory. Backend gateway accepts only file_download.start, file_download.ack, and file_download.cancel from the current controller of an active session and routes them to the worker after policy validation. Worker-origin file_download.* events are stored only as transient live state for backend-gateway fallback delivery; PostgreSQL remains authoritative for session/resource/policy state and must not store file contents.

The direct worker WSS path is also lifecycle-gated: detach returns file_download.blocked, old-controller takeover returns session.taken_over, and worker failure closes the direct transport after PostgreSQL transitions the session to failed.