# Data Plane v1 for RDP Archived status: this document is a historical RDP/WebSocket stage record, not the current runtime source of truth for transport architecture. The active fabric transport model is QUIC-only between nodes; see `docs/architecture/DISTRIBUTED_FABRIC_NODE_PROTOCOL_PLAN.md`, `docs/architecture/FABRIC_FIRST_TRANSPORT_AND_STRESS_PLAN.md`, and `docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md`. Status: DP-3A grayscale full-frame binary render foundation is implemented and smoke-proven on the test Docker environment as of 2026-04-25. DP-3B adaptive quality policy/selection is intentionally paused. The accepted C++ RDP Adapter baseline is the ordered-region path. RDP-Perf-6 makes direct dirty-region binary render explicit with `render.frame.full` / `render.frame.region` RAP2 message types and is build/probe/live-smoke-proven on the test Docker environment as of 2026-04-26. The current test Docker deployment for the RDP Adapter performance path is `rap-rdp-worker:rdp-perf6-dirty-region`. The Stage 5.2 core download data path remains runtime-proven for direct worker WSS and backend gateway fallback. Data-plane and RDP work are paused; the next active focus is Stage C10 Fabric Core / cluster foundation, not another data-plane feature. This document defines the first staged data-plane evolution for the RDP MVP. It does not implement direct worker WebSocket runtime, mesh routing, VPN, QUIC, UDP, WebRTC, relay nodes, or multi-cluster behavior. The long-term platform target is defined in `docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md`. This document narrows that target to DP-1: direct client-to-worker WSS for RDP realtime traffic, with the current backend gateway retained as fallback/debug. ## 1. Current Problem The current RDP MVP routes realtime input/render through the backend WebSocket gateway and Redis-backed coordination. This is acceptable for fallback, debugging, lifecycle proof, and early MVP validation. It is not acceptable as the production realtime path because: - render frames are high-rate and high-volume - base64/JSON render payloads add CPU and payload overhead - backend gateway can become a bottleneck under concurrent sessions - input can compete with render/frame processing - backend API capacity should be reserved for control-plane work - Redis must not become a frame transport or durable render store The current implementation remains valid as fallback while DP-1 is introduced in stages. ## 2. Target DP-1 Path Target DP-1 path: ```text Windows client -> direct WSS data-plane connection -> rdp-worker realtime endpoint -> existing RDP session runtime -> FreeRDP ``` Control-plane path remains: ```text Windows client -> backend API -> auth / org / policy / session broker -> worker selection -> short-lived data-plane token issuance ``` Fallback path remains: ```text Windows client -> backend WebSocket gateway -> current gateway/Redis/worker coordination path ``` DP-1 does not replace the session broker. It only moves realtime session traffic away from backend relay when direct worker WSS is available and authorized. ## 3. Responsibilities ### Backend as Control Plane Backend remains responsible for: - authentication - organization selection and isolation - resource authorization - resource policy evaluation - session lifecycle - worker selection - attachment ownership - takeover semantics - audit - short-lived data-plane token issuing - returning data-plane candidates - retaining backend gateway fallback Backend must not become the production high-rate render relay. ### Worker as Direct Realtime Endpoint The RDP worker becomes responsible for: - exposing an authorized direct WSS endpoint - validating `data_plane_token` - binding a WSS connection to an existing session runtime - enforcing session, attachment, user, organization, and channel scope - carrying realtime logical channels directly: - input - render - clipboard - file_upload - control / heartbeat - telemetry - preserving existing FreeRDP runtime boundaries - preserving policy enforcement already present in worker runtime The worker must not create a new RDP session just because a direct WSS connection attaches. It must bind to the existing broker-created session runtime. ### Windows Client The Windows client will eventually: - read data-plane candidates from session start/attach responses - prefer `direct_worker_wss` when available - fall back to `backend_gateway` when direct worker WSS is unavailable - keep existing lifecycle behavior unchanged - keep backend gateway support for debug/fallback No client behavior changes are required for DP-1A. ## 4. Backend Contract Proposal On session start, attach, and takeover, backend should extend the response with data-plane candidates. Example: ```json { "session_id": "session-123", "attachment_id": "attachment-456", "gateway_url": "wss://backend.example.com/api/v1/gateway/ws", "data_plane": { "preferred": "direct_worker_wss", "token": "short-lived-data-plane-token", "expires_at": "2026-04-25T13:00:00Z", "candidates": [ { "type": "direct_worker_wss", "url": "wss://worker-node.example.com/rap/v1/data-plane" }, { "type": "backend_gateway", "url": "wss://backend.example.com/api/v1/gateway/ws" } ] } } ``` Compatibility rules: - Existing fields must remain valid. - Existing clients may ignore `data_plane`. - `gateway_url` remains available for fallback/debug. - The backend must not return direct worker candidates unless the worker is live and route policy permits it. - Token TTL must be short. ### Proposed DTO Shape Names are proposals only. ```text SessionControlResult session attachment attach_token gateway_url data_plane? DataPlaneOffer preferred token expires_at candidates[] DataPlaneCandidate type url worker_id? node_id? cluster_id? priority? metadata? ``` ## 5. Data Plane Token Model `data_plane_token` must be short-lived and scoped. It is not a general API token. Required claims: - `session_id` - `attachment_id` - `user_id` - `organization_id` - `cluster_id` if available - `worker_id` - `resource_id` - `allowed_channels` - `expires_at` - `nonce` / `jti` - `issued_at` - `issuer` - `audience` Allowed channel values: - `input` - `render` - `clipboard` - `file_upload` - `control` - `telemetry` Validation rules: - token must be signed by the backend with RS256 private key - worker must validate with public key only and must not hold a signing secret - token must be short-lived - token must match the worker receiving it - token must match an active session runtime - token must match current attachment/controller where required - token must not grant channels not allowed by resource policy - token must not survive session termination - token replay must be rejected or bounded by `jti` / nonce cache Token refresh is not part of DP-1A. Future stages may either reissue tokens through the control plane or renew direct connections through a controlled flow. ## 6. Direct WSS Channel Model DP-1 uses a single WSS connection with logical channels. Later stages may split transports, but DP-1 must keep the model simple and bounded. ### `control` Reliable channel. Used for: - attach handshake - heartbeat - session state messages - detach notification - takeover notification - terminate notification - protocol errors ### `input` Highest-priority channel. Rules: - input never waits behind render - key down/up and mouse button/wheel events must be ordered - mouse move may be coalesced to latest - input queues must be bounded - stale mouse move may be dropped - click/key/wheel must not be dropped under normal operation ### `render` Droppable/latest-frame channel. Rules: - stale render frames must be dropped - latest frame wins - render must not block input/control - binary payloads should be used on direct data plane - compat fallback may continue existing JSON/base64 behavior during migration ### `clipboard` Reliable policy-gated channel. Rules: - existing `clipboard_mode` applies - text-only behavior remains until richer formats are explicitly designed - blocked behavior must remain localized in clients - worker must enforce policy again ### `file_upload` Reliable chunked channel. Rules: - existing `file_transfer_mode` applies - bounded chunk size - content hash - transfer id - no arbitrary path exposure - file upload must not block input ### `telemetry` Low-priority channel. Rules: - sampled or lossy telemetry is acceptable - telemetry must not block user traffic - useful metrics include input RTT, frame FPS, dropped frames, queue length, decode time, render apply time ## 7. Message Framing DP-1 uses: - JSON control messages for small envelopes - binary WebSocket frames for render payloads - no base64 for direct data-plane render frames Backend fallback keeps the current JSON/base64 frame path for debug/fallback. Direct worker WSS uses binary render frames when the backend advertises `render_transport=binary_v1` and the client requests `render_transport=binary_v1`. ### JSON Envelope Small control/reliable messages may use JSON: ```json { "protocol_version": 1, "session_id": "session-123", "attachment_id": "attachment-456", "channel": "input", "message_type": "mouse", "sequence": 1024, "timestamp": "2026-04-25T13:00:00.000Z", "flags": {}, "payload": {} } ``` ### Binary Frame Header DP-2 uses a fixed 16-byte preamble followed by a UTF-8 JSON header and a raw binary payload: ```text offset size field 0 4 magic = "RAP2" 4 2 protocol_version, little-endian uint16, currently 1 6 2 flags, little-endian uint16 8 4 header_length, little-endian uint32 12 4 payload_length, little-endian uint32 16 n UTF-8 JSON header 16+n m raw render payload bytes ``` The DP-2 JSON header contains: - `protocol_version` - `session_id` - `channel`, currently `render` - `message_type`, currently `render.frame.full` or `render.frame.region` on direct worker WSS; `session.frame` remains accepted as the compat DP-2 binary message type for compatibility. - `sequence` - `timestamp` - `flags` - `payload_length` - `frame_width` - `frame_height` - `frame_stride` - `frame_format` - optional region fields when `message_type=render.frame.region`: `region_x`, `region_y`, `region_width`, `region_height`, `region_stride`, `region_format=BGRA32` - optional `color_mode`, currently `full_color` or `grayscale` - optional `quality_profile` - optional `original_frame_format` - optional `output_frame_format` - optional `raw_frame_bytes` - optional `binary_direct_bytes` - optional diagnostics: `full_frame_bytes`, `region_bytes`, `region_savings_percent`, `diff_time_ms`, `render_update_reason`, `fallback_to_full_frame_reason` - optional `input_correlation_id` - optional `worker_frame_captured_at` Binary frames must include a fixed or clearly parseable header before payload. Required header fields: - `protocol_version` - `session_id` - `channel` - `message_type` - `sequence` - `timestamp` - `flags` - `payload_length` Render payload must not be base64 encoded on direct data plane. Suggested render message types: - `render.frame.full` - `render.frame.region` - `render.cursor` - `render.resize` - `render.quality.changed` Suggested flags: - `keyframe` - `droppable` - `latest_only` - `compressed` - `interactive` - `grayscale` ## 8. Quality Profile Foundation DP-1A defines quality profiles only. It does not implement adaptive rendering. Profiles: - `emergency_grayscale` - `low_bandwidth` - `text_priority` - `balanced` - `high_quality` Color modes: - `full_color` - `256_colors` - `64_colors` - `16_colors` - `grayscale` Rules: - quality profile must affect real render behavior in later stages - input priority remains absolute - render quality must degrade before input latency increases - lower profiles may reduce FPS, color depth, region size, or compression settings - higher profiles may increase FPS and color fidelity only when queues remain healthy - profile selection must be policy-aware and observable ## 9. Security Model DP-1 security boundaries: - backend authorizes session access - backend issues short-lived data-plane token - worker validates token before accepting direct WSS - worker binds token to existing session runtime - worker enforces channel permissions - worker rejects mismatched session, attachment, organization, resource, worker, or expired token - backend gateway fallback keeps existing auth path Transport: - direct worker WSS must use TLS - future node-to-node traffic uses mTLS as defined in the Secure Access Fabric target - DP-1 direct WSS may start with worker server TLS plus signed token validation - P3.2 direct worker WSS trust metadata distinguishes `smoke_insecure`, `public_ca`, and `platform_ca` - production backend must not advertise smoke-only direct candidates - production clients must not use insecure TLS bypass and must fall back to backend gateway if direct worker trust is unavailable - production deployments should avoid long-lived static worker secrets Audit: - backend audits token issuance - backend audits session lifecycle - worker should report direct attach/detach/failure events back to control plane - direct data-plane traffic does not require auditing every input/render event - high-risk events such as takeover, failed token validation, policy denial, and file transfer should be auditable ## 10. Fallback Backend Gateway Path The current backend WebSocket gateway remains: - fallback path - debug path - compatibility path for older clients - smoke-test path while DP-1 is staged Fallback activation cases: - no direct worker candidate returned - direct WSS connect fails - token validation fails due to stale route - worker endpoint unavailable - policy forces backend gateway - client version does not support direct WSS Fallback rules: - fallback must preserve existing lifecycle behavior - fallback must not silently weaken policy - fallback should be visible in logs/telemetry - fallback should be measurable against direct path latency ## 11. Migration Stages ### Stage DP-1A: Spec Only Create architecture/spec documentation. No runtime behavior changes. ### Stage DP-1B: Backend Offers Data Plane Candidates Status: completed. Backend extends session start/attach/takeover responses with `data_plane`. Client still uses fallback backend gateway. Implementation status: - backend response DTO can include optional `gateway_url` and `data_plane` - `data_plane.token` is a short-lived signed token with session, attachment, user, organization, worker, resource, allowed-channel, expiry, and `jti` scope - `backend_gateway` candidate is always returned when configured - `direct_worker_wss` candidate is returned only when a direct worker WSS URL template is configured - current clients may ignore `data_plane` safely - no worker direct WSS runtime is implemented in this stage - no client routing behavior changes in this stage Verification: - old clients still work - responses include valid candidate shape - token is short-lived - token is scoped ### Stage DP-1C: Worker Direct WSS Endpoint Status: completed. Worker exposes direct WSS endpoint and validates `data_plane_token`. Windows client still uses fallback backend gateway. Implementation status: - worker has optional `/rap/v1/data-plane` WSS endpoint - endpoint is disabled by default and requires TLS certificate/key paths - worker validates signed RS256 `data_plane_token` with a public key only - worker keeps no data-plane signing secret - worker rejects reused `jti` values with a bounded in-memory TTL cache - token validation checks session, attachment, user, organization, worker, resource, allowed channels, expiry, audience, and `jti` - endpoint binds only to existing `SessionRuntime` - bind checks reject old attachment after takeover, wrong attachment, wrong worker, wrong organization, wrong resource, missing runtime, failed/terminated runtime state, and channels broader than runtime policy - invalid token, wrong worker, expired token, replayed `jti`, and missing runtime are rejected - `rdp-worker-dataplane-token-probe` validates token behavior in the worker image - `rdp-worker-dataplane-bind-probe` validates attachment/state/channel bind policy without starting RDP - backend gateway remains active fallback - no Windows client routing change is included in this stage Verification: - token validation works - runtime binding rejects missing runtime without creating a new RDP session - replayed `jti` values are rejected - wrong attachment and over-broad channels are rejected - no new RDP session is created - invalid tokens are rejected ### Stage DP-1D: Windows Client Prefers Direct WSS Status: completed as hardened client transport selection. Windows client uses direct worker WSS only when the candidate is explicitly marked data-capable. Current DP-1C worker endpoint validates and binds but does not yet carry production render/input traffic, so unmarked candidates fall back to the backend gateway immediately. Implementation status: - Windows session DTOs understand optional `data_plane` offers and candidates - transport selection remains behind `ISessionGatewayClient` - direct worker WSS candidates are considered only when metadata contains `runtime_transport=json_v1` or `traffic_ready=true` - direct WSS attach attempts use short bounded timeout and never block the UI - failed/unavailable/not-ready direct path automatically uses backend gateway - existing backend gateway behavior remains unchanged - no worker runtime changes are included in DP-1D - no binary render frames or mesh/relay/VPN behavior is included Verification: - Windows client build succeeds - fallback works and remains the default runtime path for current DP-1C endpoint - direct candidate selection is capability-gated to avoid losing render/input - lifecycle behavior remains stable ### Stage DP-1D.1: Worker Direct JSON Realtime Bridge Status: runtime-proven on the test Docker environment as of 2026-04-25. Worker direct WSS now carries the same JSON realtime envelopes already used by the backend gateway. This is intentionally a bridge stage, not the final production data-plane protocol. Implementation status: - worker direct WSS accepts existing JSON `input`, `control`, `clipboard`, and `file_upload` envelopes - worker direct WSS emits existing JSON `session.state`, `session.frame`, `session.taken_over`, `clipboard.text`, and `file_upload.progress` events - direct WSS binds only to an existing `SessionRuntime`; it never creates a new RDP runtime - direct inbound envelopes are bounded and drained before Redis fallback input - mouse move can be coalesced, but click, wheel, keyboard, clipboard, and file upload envelopes remain reliable within bounded queues - direct render is latest-frame-only and droppable in the worker WSS writer - direct inbound envelopes are tagged with token-bound session, attachment, user, organization, worker, and resource claims before they enter runtime - runtime rejects direct envelopes whose `attachment_id` no longer matches the current active controller attachment - takeover updates emit `session.taken_over` to the previous direct attachment while normal frame/state events continue only to the current attachment - backend advertises direct metadata only when `DATA_PLANE_DIRECT_WORKER_JSON_RUNTIME=true` - backend gateway fallback remains active and unchanged - Windows client behavior remains gated by DP-1D metadata selection Verification performed: - backend `go test ./...` passes - Windows client build passes with no routing behavior change required - worker canonical Docker image builds with the direct JSON bridge - DP-1C endpoint smoke still proves malformed token rejection, valid-token-without-runtime rejection, and `jti` replay rejection - backend tests prove `runtime_transport=json_v1` and `traffic_ready=true` are emitted only when the explicit runtime flag is enabled - live runtime proof was run on test Docker `192.168.200.61` with `DATA_PLANE_DIRECT_WORKER_JSON_RUNTIME=true` - backend session start returns `direct_worker_wss` candidate metadata: `runtime_transport=json_v1` and `traffic_ready=true` - Windows desktop smoke selected `direct_worker_wss` and connected to `wss://192.168.200.61:18443/rap/v1/data-plane` - worker direct WSS validated the token and bound to the existing runtime - direct WSS accepted input envelopes and applied mouse/keyboard events through FreeRDP - direct WSS emitted JSON render/state events and the Windows client rendered a real desktop frame - direct WSS carried text clipboard client-to-server through the existing `clipboard` envelope and worker policy/cliprdr boundary - direct WSS carried chunked file upload through existing `file_upload.start` / `file_upload.chunk` envelopes and emitted `file_upload.progress` - fallback was proven by advertising an unavailable direct worker URL; the Windows client timed out direct WSS and selected `backend_gateway` - detach, reattach, takeover, `session.taken_over`, input, and render remained stable in direct and fallback smoke runs - no new RDP runtime was created by direct WSS attach; worker logs showed one `started new runtime` for the session and later `updated assignment for existing session` on reattach/takeover Known limitations after DP-1D.1: - direct render still uses JSON/base64 full-frame payloads; binary render frames remain DP-2 - direct server-to-client clipboard was not re-matrixed in this DP proof because Stage 4.1 already proved FreeRDP cliprdr behavior; DP-1D.1 proved that the direct bridge carries clipboard envelopes and preserves worker enforcement - file upload direct proof lands in the existing restricted worker visible transfer directory; broader file-transfer UX remains outside DP-1D.1 - the Windows smoke script reports `rendering=false` when compact layout hides telemetry controls, even though frame receipt/rendering is proven by logs and UIA event text ### Stage DP-1E: Latency Comparison Status: measurement-complete on the test Docker environment as of 2026-04-25. Compare direct path vs fallback before starting DP-2 binary render frames. Metrics: - input capture to worker apply - worker frame capture to client render - frame queue length - dropped stale frames - close/dispose latency - fallback activation count Smoke commands used: ```powershell pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 ` -PreferDirectDataPlane:$true ` -AllowInsecureDirectDataPlaneTlsForSmoke:$true ` -DirectDataPlaneConnectTimeoutMs 2500 ` -SkipOrgSwitchAndTokenRefresh pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 ` -PreferDirectDataPlane:$false ` -AllowInsecureDirectDataPlaneTlsForSmoke:$true ` -DirectDataPlaneConnectTimeoutMs 750 ` -SkipOrgSwitchAndTokenRefresh ``` Measured sessions: - direct worker WSS: `59af4b37-3708-4cff-8e9d-054869946250` - backend gateway fallback baseline: `673b7540-6276-4d73-824b-e5b2ea96182a` - additional fallback-activation proof: direct candidate unavailable/not-ready logs on `8d89dd5c-fb14-4f70-a4e4-01ebb2a37da4` and `673b7540-6276-4d73-824b-e5b2ea96182a` Verification summary: - direct smoke passed login, resource list, start, input, detach, reattach, takeover, `session.taken_over`, and logout - fallback smoke passed login, resource list, start, input, detach, reattach, takeover, `session.taken_over`, and logout - smoke `rendering=false` is a compact-layout harness artifact; session event log contained `Desktop frame received` and client logs contained `SessionWindow rendered frame` - cleanup probe against `/api/v1/sessions/active` returned 404 because that endpoint does not exist; the implemented list endpoint is `/api/v1/sessions?user_id=...` - Redis worker queues were empty after the measured runs: `worker:queue:59af... = 0`, `worker:queue:673b... = 0` Latency matrix: | Metric | Direct worker WSS | Backend gateway fallback | | --- | ---: | ---: | | Client transport selection | `selected=direct_worker_wss` in desktop logs | `selected=backend_gateway` in desktop logs | | Client capture/send to worker apply | direct smoke retained worker-side receive/apply timestamps; client capture timestamp was not retained in the compact smoke log | sampled fallback activation: about `205ms` from WPF capture to worker apply for mouse down | | Backend gateway input hop | bypassed for direct realtime input | backend receive to route typically `<1ms` | | Worker receive to FreeRDP apply, mouse down | `0ms` to `24ms` observed | `0ms` to `29ms` observed | | Worker receive to FreeRDP apply, mouse up | `25ms` to `26ms` observed | `28ms` to `29ms` observed | | Worker receive to FreeRDP apply, key down | about `25ms` observed | about `33ms` observed | | Worker receive to FreeRDP apply, key up | about `26ms` observed | about `30ms` observed | | Backend route to worker receive | not applicable | about `31ms` for key down, about `0ms` to `31ms` for sampled mouse/key events | | FreeRDP apply to next captured frame | `0ms` to `40ms` observed | `0ms` to `43ms` observed | | Worker frame capture to backend receive | backend still receives worker frame telemetry; observed same-second receive | same-second receive | | Backend frame receive to client write | not on direct render path | sampled `486ms` and `753ms` on full-frame JSON/base64 gateway writes | | Client render proof | `session.frame` received and frame rendered in direct smoke | `SessionWindow rendered frame seq=19 size=1280x720` | | SessionGatewayConnection dispose | about `1ms` in sampled close traces | about `1ms` in sampled close traces | | SessionWindow closed handler | below `1ms` in sampled close traces | below `1ms` in sampled close traces | Queue and backpressure observations: - direct inbound drained bounded batches before fallback Redis input - direct mouse move coalescing was active while preserving click/key ordering - direct outbound reported `frames_queued_per_second` matching `frames_sent_per_second`; `reliable_dropped=0` - worker render pending remained `0` for both paths - fallback Redis append queue length stayed bounded in sampled logs, usually `1` to `3`, and returned to `0` after the run Render observations: - direct render is already latest-frame-only/droppable at the worker WSS writer - worker render rates during interaction were approximately: - direct: `~3.0` to `~5.7` frames/sec sent/published, pending `0` - fallback: `~2.0` to `~5.0` frames/sec published, pending `0` - current frames are still JSON/base64 full-frame payloads - measured frame payload size remains about `4,915,200` bytes per JSON/base64 frame, so DP-1D.1 improves routing but does not remove the render payload bottleneck Fallback activation proof: - fallback was explicitly selected when the client was configured with `PreferDirectDataPlane=false` - fallback was also visible when direct WSS was unavailable or not runtime ready, with client logs: - `data_plane.transport direct_worker_wss failed; falling back to backend_gateway` - `data_plane.transport direct_worker_wss unavailable_or_not_runtime_ready; using backend_gateway` - `data_plane.transport selected=backend_gateway` DP-1E conclusion: - direct worker WSS removes backend/Redis from the realtime input path - fallback backend gateway remains functional and observable - neither path showed unbounded input queue growth during smoke - close/dispose traces remained fast in sampled logs - the dominant remaining bottleneck is render payload format and size, not worker input scheduling - DP-2 should focus on binary render frames and avoiding base64/JSON render payloads on the direct data plane ### Stage DP-2: Binary Render Frames Status: implemented and smoke-proven on the test Docker environment as of 2026-04-25. Direct worker WSS now sends render payloads as binary WebSocket frames when the backend candidate metadata advertises `render_transport=binary_v1` and the Windows client requests that transport. Backend gateway fallback continues to use the existing JSON/base64 frame path. Goals: - remove base64 overhead from the direct worker WSS wire path - reduce direct render payload size - keep backend gateway JSON/base64 fallback intact - keep direct render latest-frame-only and droppable - keep input/control ahead of render Implementation notes: - Backend advertises binary direct render only when `DATA_PLANE_DIRECT_WORKER_BINARY_RENDER=true`. - Direct candidate metadata includes `runtime_transport=json_v1`, `traffic_ready=true`, and `render_transport=binary_v1`. - Worker direct WSS accepts existing JSON envelopes for control/input/clipboard/file_upload and emits binary WebSocket frames for `session.frame`. - Windows client enables binary parsing only for direct candidates that advertise `render_transport=binary_v1` or `binary_render=true`. - Backend gateway fallback remains unchanged and continues to deliver `session.frame` as JSON/base64. Smoke proof: - direct session id: `824c0057-c8a0-4366-b5c2-805597ae2d61` - fallback session id: `28e4b198-2c27-4971-951a-7b187c11f96d` - direct client selected `direct_worker_wss` with `render_transport=binary_v1` - direct worker bind succeeded with `render_transport=binary_v1` - client received binary frames with raw payload size `3,686,400` bytes - client rendered binary frames, including frame sequences `1`, `2`, `4`, `7`, `9`, `12`, `14`, `15`, `17`, `18`, and `19` - fallback client selected `backend_gateway` - fallback rendered JSON/base64 frames through the existing backend gateway path Payload comparison: - DP-1E JSON/base64 frame payload: about `4,915,200` bytes for `1280x720` BGRA - DP-2 direct binary frame payload: `3,686,400` bytes for the same `1280x720` BGRA frame, plus a small binary preamble and JSON header - Direct wire payload reduction is about 25 percent compared with base64. Latency and queue observations from smoke: - direct click frame render sample: worker captured frame at `1777141091937`, WPF rendered it at `2026-04-25T21:18:11.6628382+03:00`, about 226 ms later - direct key-down frame render sample: worker captured frame at `1777141093434`, WPF rendered it at `2026-04-25T21:18:13.1614990+03:00`, about 727 ms later - direct worker render rate sample: `seen_per_second=4.953283`, `published_per_second=3.962626`, `dropped_per_second=0.990657`, `pending=0` - direct data-plane outbound sample: `frames_queued_per_second=5.404927`, `frames_sent_per_second=5.404927`, `binary_render_bytes_per_second=19926577.806299`, `json_render_bytes_per_second=0.000000`, `reliable_dropped=0` - fallback worker render rate sample: `seen_per_second=4.871576`, `published_per_second=3.897260`, `dropped_per_second=0.974315`, `pending=0` Known limitations: - DP-2.1 removed the internal base64 encode/decode hop from the direct render path. The direct worker WSS sender now receives raw captured frame bytes and writes them into `RAP2` binary frames without decoding a compatibility `session_frame`. - The worker still builds compatibility `session_frame` events with base64 for backend gateway/live-state fallback. That compatibility conversion is intentionally isolated to the fallback boundary and is not used by the direct binary render sink. - Backend still receives compatibility worker frame events for fallback/debug. Binary render frames are not routed through Redis or backend gateway. - At the DP-2.1 point, dirty regions, tile encoding, adaptive quality, compression/codecs, and color-mode reduction remained later work. - Smoke `rendering=false` remains a compact-layout harness artifact; UIA output and client logs prove `Desktop frame received` and `SessionWindow rendered frame`. ### Stage DP-2.1: Worker Raw-Frame Split Status: implemented and smoke-proven on the test Docker environment as of 2026-04-25. DP-2.1 keeps the DP-2 `RAP2` binary frame contract and removes the remaining worker-internal base64 encode/decode hop from the direct render path. Implementation notes: - FreeRDP frame capture now produces raw BGRA frame bytes for worker runtime render notifications. - `SessionRuntime` splits render publication into two outputs: - direct binary render sink receives raw frame bytes - compatibility fallback sink builds JSON/base64 `session_frame` only for backend gateway/live-state fallback - Worker direct WSS sends raw captured frame bytes as `RAP2` binary WebSocket frames when `render_transport=binary_v1` is active. - Backend gateway fallback remains unchanged and still receives JSON/base64 `session.frame` compatibility events. - Direct render remains latest-frame-only and droppable; input/control scheduling is unchanged. Smoke proof: - direct session id: `b4720057-db61-4c72-bb4c-bccfd7e30008` - fallback session id: `65d0667b-aaef-4042-ae30-4c34d151e5aa` - direct client selected `direct_worker_wss` with `render_transport=binary_v1` - fallback client selected `backend_gateway` - direct client received `binary_frame_received` payloads of `3,686,400` bytes for `1280x720` BGRA - direct client rendered frame sequences including `1`, `2`, `4`, `7`, `9`, `13`, `14`, `15`, `16`, `18`, `19`, and `20` - fallback client rendered JSON/base64 `session.frame` through backend gateway - worker logs show `raw_frame_bytes=3686400`, `binary_direct_bytes=3686400`, `base64_compat_bytes=4915200`, `encode_skipped_for_direct=true`, and `fallback_compat_frame_built=true` - worker direct outbound logs show `binary_render_bytes_per_second` non-zero and `json_render_bytes_per_second=0.000000` Payload and conversion proof: - direct raw frame payload remains `3,686,400` bytes plus the small `RAP2` preamble/header - fallback compatibility payload remains about `4,915,200` base64 bytes for the same frame - direct render no longer decodes `frame_data` from compatibility base64 before binary send - base64 is still generated for fallback/debug because the backend gateway path intentionally remains JSON/base64 Known limitations: - DP-2.1 is an internal worker render plumbing cleanup only. - Full-frame BGRA payloads are still heavy. - At the DP-2.1 point, dirty regions, tiles, adaptive quality, compression/codecs, and color-mode reduction remained future work. - Backend gateway fallback remains JSON/base64 by design. ### Stage DP-3A: Grayscale Full-Frame Binary Render Status: implemented and smoke-proven on the test Docker environment as of 2026-04-25. DP-3A adds the first conservative quality foundation for the direct binary render path. DP-3A itself did not implement tiles, compression, codecs, or adaptive profile switching. Dirty-region direct binary rendering is handled by the later RDP Adapter RDP-Perf-6 path. Contract changes: - backend direct binary render candidates advertise `render_transport=binary_v1` - backend direct binary render candidates advertise `supported_color_modes=["full_color","grayscale"]` - backend direct binary render candidates advertise `default_color_mode="full_color"` - Windows client requests `full_color` by default - Windows smoke can request `grayscale` through `-DirectDataPlaneColorMode grayscale` - `RAP2` binary frame headers carry `color_mode`, `quality_profile`, `original_frame_format`, `output_frame_format`, `raw_frame_bytes`, and `binary_direct_bytes` Implementation notes: - `full_color` direct render sends the existing raw BGRA frame unchanged. - `grayscale` direct render converts BGRA bytes in the worker direct binary sink before WSS send. - The grayscale path preserves BGRA32 output format so the Windows presenter can reuse the existing render path. - Backend gateway fallback remains JSON/base64 and is not affected by direct grayscale mode. - Direct render remains latest-frame-only and droppable. - Input/control scheduling is unchanged and remains higher priority than render. Smoke proof: - direct full-color session id: `74a0e5c6-02e0-487f-a1a1-c2850a13881c` - direct grayscale session id: `3b616bd7-1179-4ec5-879f-7cd270f92a0a` - fallback backend-gateway session id: `e5724cac-7f09-4931-9ad9-156a3f33d0b1` - direct full-color client selected `direct_worker_wss` with `render_transport=binary_v1`, `requested_color_mode=full_color`, and `applied_color_mode=full_color` - direct grayscale client selected `direct_worker_wss` with `render_transport=binary_v1`, `requested_color_mode=grayscale`, and `applied_color_mode=grayscale` - fallback smoke selected `backend_gateway` and continued to render JSON/base64 `session.frame` events - direct full-color frames rendered with `color_mode=full_color` and `bytes=3686400` - direct grayscale frames rendered with `color_mode=grayscale` and `bytes=3686400` - worker logs show `grayscale_conversion_applied=false` for full color and `grayscale_conversion_applied=true` for grayscale - worker logs show `raw_frame_bytes_before=3686400`, `raw_frame_bytes_after=3686400`, and `binary_direct_bytes=3686400` - worker grayscale conversion time was observed around `1-2 ms` per sampled `1280x720` BGRA frame - worker direct outbound logs show binary render traffic and `json_render_bytes_per_second=0.000000` on the direct binary path Verification commands: ```powershell pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 -PreferDirectDataPlane:$true -AllowInsecureDirectDataPlaneTlsForSmoke:$true -DirectDataPlaneConnectTimeoutMs 2500 -DirectDataPlaneColorMode full_color -SkipOrgSwitchAndTokenRefresh pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 -PreferDirectDataPlane:$true -AllowInsecureDirectDataPlaneTlsForSmoke:$true -DirectDataPlaneConnectTimeoutMs 2500 -DirectDataPlaneColorMode grayscale -SkipOrgSwitchAndTokenRefresh pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 -PreferDirectDataPlane:$false -AllowInsecureDirectDataPlaneTlsForSmoke:$true -DirectDataPlaneConnectTimeoutMs 2500 -DirectDataPlaneColorMode grayscale -SkipOrgSwitchAndTokenRefresh ``` Known limitations after DP-3A: - `grayscale` currently reduces color fidelity but not wire byte size because the output format remains BGRA32. - `256_colors`, `64_colors`, `16_colors`, and palette modes are not implemented. - Tiles, compression/codecs, and adaptive profile switching remain future work. - Backend gateway fallback remains JSON/base64 by design. - Smoke `rendering=false` remains a compact-layout harness artifact in some runs; client logs prove `Desktop frame received` and `SessionWindow rendered frame`. ### RDP-Perf-6: Direct Dirty-Region Binary Render Contract Status: implemented and build/probe/live-smoke-proven on the test Docker environment as of 2026-04-26 using `P3.3 Secret RDP Resource`, direct worker WSS, and `rap-rdp-worker:rdp-perf6-dirty-region`. RDP-Perf-6 keeps the existing `RAP2` binary WebSocket transport and adds explicit direct render message types: - `render.frame.full` - `render.frame.region` Compatibility: - Windows client direct transport still accepts compat binary `message_type=session.frame`. - Inside the Windows application pipeline, direct binary frames are normalized back into the existing `session.frame` envelope so UI, lifecycle, input, clipboard, and file transfer behavior remain unchanged. - Backend gateway fallback remains JSON/base64 and is not removed. Dirty-region frame metadata: - `frame_width`, `frame_height`, `frame_stride`, `frame_format` - `desktop_width`, `desktop_height` - `region_x`, `region_y`, `region_width`, `region_height` - `region_stride`, `region_format=BGRA32` - `payload_length` - `input_correlation_id` and `worker_frame_captured_at` when available Diagnostics added for payload and latency analysis: - `full_frame_sent` - `region_frame_sent` - `full_frame_bytes` - `region_bytes` - `region_savings_percent` - `diff_time_ms` - `render_update_reason` - `fallback_to_full_frame_reason` Implementation notes: - Worker direct WSS emits `render.frame.full` for baseline/recovery frames and `render.frame.region` for dirty-region patches. - Worker direct render logs include payload savings and diff/capture timing. - Windows direct transport accepts the explicit render message types. - Windows `DesktopFramePresenter` maintains a session framebuffer and patches BGRA32 region payloads into it before presenting the updated surface. - Full-frame fallback remains available for first frame, attach/reattach, resize, region-loss repair, and debug/fallback paths. Observed runtime proof: - Direct transport selected `direct_worker_wss` with `render_transport=binary_v1`. - Baseline frame used `render.frame.full`, `1280x720`, `3,686,400` bytes. - Dirty-region examples used `render.frame.region`: `64x64` = `16,384` bytes (`99.56%` savings), `1280x128` = `655,360` bytes (`82.22%` savings), and `640x64` = `163,840` bytes (`95.56%` savings). - Direct-only binary region frames logged `fallback_compat_frame_built=false` while backend gateway fallback compatibility remained available separately. - Input, detach, reattach, takeover, and takeover event handling remained smoke-proven in the same run. ### Stage DP-3B: Adaptive Quality Implement quality profiles and adaptive render behavior. Goals: - lower latency under load - bounded queues - real profile behavior - color mode adaptation ## 12. Risks ### Token Leakage Risk: - direct worker token could be reused. Mitigation: - short TTL - `jti` / nonce - worker-scoped audience - attachment/session binding - TLS ### Worker Endpoint Exposure Risk: - worker direct endpoint becomes an attack surface. Mitigation: - token validation before bind - rate limits - TLS - no unauthenticated session enumeration - minimal endpoint surface ### Policy Drift Risk: - backend and worker disagree on allowed channels. Mitigation: - token claims include allowed channels - worker receives policy snapshot in assignment - worker enforces policy again - policy changes trigger session update or reconnect where required ### Fallback Masking Production Problems Risk: - clients silently fall back and hide direct data-plane failure. Mitigation: - log fallback reason - expose telemetry - smoke tests verify both direct and fallback paths ### Render Still Too Heavy Risk: - direct WSS improves routing but full-frame render remains expensive. Mitigation: - DP-2 binary frames - DP-3 adaptive quality - dirty regions / tiles - latest-frame-only semantics ### File Upload Starving Input Risk: - reliable file chunks can fill send queues. Mitigation: - channel priority - bounded file queues - chunk pacing - input preemption ## 13. Future Verification Plan Future DP-1 implementation must prove: - backend gateway fallback still works - direct worker WSS connects - token validation works - invalid/expired/wrong-worker tokens are rejected - direct WSS binds to existing session runtime - direct WSS does not recreate remote RDP session - input works over direct WSS - rendering works over direct WSS - clipboard still works - file upload still works - fallback activates if direct worker path is unavailable - input latency improves compared with fallback - render backlog does not grow - stale render frames are dropped - close/dispose is immediate - org/session/attachment/channel scope is enforced ## 14. Next Implementation Prompt Data-plane and RDP work are paused by product decision. DP-3B, Stage 5.2 remaining RDP desktop proof, and further RDP performance work must not start without a new explicit RDP/data-plane stage prompt. The next active project work is Stage C10 in the lower Secure Access Fabric foundation: ```text Proceed with Stage C10 only. Goal: Consolidate Fabric Core architecture and prepare scoped cluster configuration distribution design. Scope: - define signed scoped cluster snapshot model - define node-local state boundaries - define peer directory/cache boundaries - define Fabric Storage / Config Storage role - define source-of-truth vs distribution/cache boundaries - define multi-cluster isolation boundaries - define future implementation stages C11-C18 Do NOT: - implement mesh runtime - implement VPN - implement RDP work - implement service workloads - change backend/runtime code ``` ## 15. Non-Goals DP-1 does not implement: - full mesh - VPN - QUIC - UDP transport - WebRTC - relay nodes - multi-cluster routing - adaptive quality beyond DP-3A grayscale full-frame foundation - binary render frames for fallback backend gateway - adaptive profile switching beyond DP-3A and dirty regions - removal of current backend WebSocket gateway - RDP MVP rewrite