Files
rdp-proxy/docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md
2026-05-18 21:33:39 +03:00

22 KiB

RDP Service C++ Performance Target

Paused/archival note: this document is an RDP performance track record, not the current source of truth for node-to-node transport. Fabric transport is now QUIC-only between nodes; use docs/architecture/DISTRIBUTED_FABRIC_NODE_PROTOCOL_PLAN.md, docs/architecture/FABRIC_FIRST_TRANSPORT_AND_STRESS_PLAN.md, and docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md for the active transport model.

Status

This is the paused RDP service performance direction. The implementation name is RDP Adapter: a concrete Service Adapter that translates Microsoft RDP into the platform session/data-plane protocol. The common adapter contract is defined in docs/architecture/SERVICE_ADAPTER_PROTOCOL.md; the RDP-specific runtime plan is defined in docs/architecture/RDP_ADAPTER_RUNTIME.md.

Current implementation status:

  • RDP-A1 / RDP-Perf-1 is build-proven.
  • RDP-A2 adapter boundary is live-smoke-proven on the test Docker environment as of 2026-04-26: runtime code now goes through RdpAdapterRuntime.
  • RDP-Perf-2 runtime instrumentation is build-proven and live-smoke-proven on the test Docker environment as of 2026-04-26.
  • RDP-Perf-3 region-first BGRA fallback is build-proven and live-smoke-proven on the test Docker environment as of 2026-04-26.
  • RDP-Perf-4 gated RDPGFX foundation is build-proven and default-path smoke-proven on the test Docker environment as of 2026-04-26. The current live RDP target resets the connection when RDPGFX is advertised, so RDPGFX remains disabled by default.
  • RDP-A4 CursorAdapter is build-proven and live-smoke-proven on the test Docker environment as of 2026-04-26. Cursor events now flow as latest-only adapter-origin cursor.update events over direct worker WSS and remain compatible with backend gateway fallback.
  • RDP-Perf-5A GDI repaint cadence hardening is build-proven and smoke-proven on the test Docker environment as of 2026-04-26. Region/interactive frames now publish on a 33 ms cadence, hot-loop lease renewal was removed, and backend gateway fallback remains compatible.
  • RDP-Perf-6 dirty-region direct binary contract is build/probe/live-smoke-proven on the test Docker environment as of 2026-04-26. Direct RAP2 render frames now distinguish full frames from dirty-region patches and carry payload savings diagnostics; observed runtime dirty regions reduced payloads from the 3,686,400 byte full frame to examples such as 16,384, 163,840, and 655,360 bytes.
  • Current accepted baseline is rap-rdp-worker:rdp-p1-region-order2: dirty-region delivery is preserved in order through SessionRuntime, worker direct WSS, Windows transport, and WPF presenter queues. Manual visual smoke accepted idle repaint, Start menu/hover, keyboard, mouse, and session close.
  • Remaining visual limitation is quality/performance rather than correctness: window drag behaves like older/slow-link RDP clients by moving a frame, and repaint after release is usable but not polished.
  • FreeRDP remains the internal substrate behind the adapter boundary until region-first/event-driven replacement paths are live-proven.
  • RDP performance work is paused by product decision. When RDP work explicitly resumes, the next RDP step should continue from the stable GDI region-first path unless an RDPGFX-compatible target is added for gated testing.

The C++ worker remains the primary RDP runtime. The goal is not to rewrite the worker in another language. The goal is to replace the slow parts of the RDP service internals while preserving the proven backend/session/cluster/data-plane contracts.

The C# RDP service skeleton is superseded as a runtime direction and must not be used for implementation unless explicitly re-approved later.

Current Problem

The current MVP proved the hard lifecycle behavior:

  • connect
  • active state
  • detach without killing the remote session
  • reattach
  • takeover
  • terminate
  • clipboard text
  • file upload to worker storage
  • direct worker WSS data-plane

However, the render/input experience is not acceptable.

Root cause:

  • the worker uses FreeRDP successfully for the RDP connection
  • but the production render path still behaves like framebuffer capture
  • the worker copies large BGRA buffers and publishes them as RAP frames
  • input is fast enough in parts of the path, but visual feedback depends on slow snapshot/frame delivery

On a >1 Gbit LAN this should not be slow. The bottleneck is the RDP service render algorithm, not the network.

Non-Negotiable Boundaries

Do not change:

  • backend control plane
  • organization/session lifecycle
  • PostgreSQL source of truth
  • Redis live coordination model
  • worker leases and assignment contracts
  • data_plane_token contracts
  • direct worker WSS transport
  • backend gateway fallback
  • clipboard/file-transfer policy semantics

Only the RDP service adapter internals may change.

Target Design

Keep the worker in C++.

Use C++ to own the RDP service internals:

  • input adapter
  • graphics adapter
  • cursor adapter
  • virtual channel adapters
  • quality/adaptive controller
  • render sink to existing RAP data-plane

FreeRDP may remain temporarily as a connection/security/channel substrate, but the target production render path must not be FreeRDP GDI framebuffer snapshots. If a FreeRDP layer blocks access to the needed RDP graphics primitives, replace that narrow layer with project-owned C++ code rather than rewriting the full service in another language.

High-Performance RDP Model

Fast RDP clients do not repeatedly send full desktop images. They use protocol updates:

  • dirty rectangles
  • surface commands
  • cursor updates
  • bitmap/cache updates
  • RDPGFX dynamic virtual channel
  • RemoteFX Progressive / ClearCodec / H.264 AVC420 / AVC444 / HEVC where negotiated
  • adaptive graphics and quality selection

References:

New Internal Layers

flowchart LR
    Target["Windows RDP Server"]
    RdpCore["C++ RDP Core / FreeRDP Substrate"]
    Graphics["Graphics Adapter"]
    Input["Input Adapter"]
    Channels["Virtual Channel Adapters"]
    DataPlane["Existing Direct Worker WSS"]
    Client["RAP Windows Client"]

    Target <--> RdpCore
    RdpCore --> Graphics
    Input --> RdpCore
    RdpCore <--> Channels
    Graphics --> DataPlane
    Channels --> DataPlane
    DataPlane <--> Client

Graphics Adapter

The graphics adapter converts RDP graphics primitives into RAP render updates.

Supported update classes:

  • frame_full_bgra only for baseline/debug/fallback
  • region_bgra for dirty regions
  • surface_create
  • surface_delete
  • surface_map
  • surface_bits
  • encoded_frame
  • cursor_update

Rules:

  • full-frame BGRA is fallback, not the target production path
  • direct render remains binary
  • backend gateway fallback may keep JSON/base64 compatibility
  • stale render updates are droppable
  • input never waits behind render

Input Adapter

Input stays separate from render.

Rules:

  • keyboard down/up is reliable and ordered
  • mouse button down/up and wheel are reliable and ordered
  • mouse move is latest-only/coalesced
  • button down must include or be preceded by pointer position
  • no RAP focus message may consume the first remote click
  • input must not trigger full-frame capture loops

Virtual Channel Adapters

Clipboard/file/drive redirection remain isolated:

  • clipboard stays text-only until explicitly expanded
  • restricted drive mapping remains policy-bound
  • file upload/download policies stay enforced in the real data path

Weak Network Strategy

Weak-channel performance must degrade render before input.

Priority order:

  1. input
  2. control
  3. clipboard
  4. render key updates
  5. file transfer
  6. telemetry

Render adaptation:

  • drop stale render updates
  • prefer dirty regions over full frames
  • reduce FPS before increasing input latency
  • reduce color mode where useful
  • use text-priority mode for office/admin workloads
  • use encoded/compressed graphics payloads where negotiated
  • never let file transfer or VPN-like bulk traffic starve RDP input/control

Quality profiles:

  • emergency_grayscale
  • low_bandwidth
  • text_priority
  • balanced
  • high_quality

Color modes:

  • full color
  • 256 colors
  • 64 colors
  • 16 colors
  • grayscale

Migration Stages

RDP-A1 / RDP-Perf-1: Boundary And Audit

Create C++ graphics/input adapter boundaries and document the replacement path. Do not change runtime behavior yet.

Deliver:

  • common Service Adapter channel contract
  • RDP Adapter runtime plan
  • graphics_adapter interface
  • render update model
  • compile-safe probe
  • docs update

Status: completed and build-proven.

RDP-Perf-2: Runtime Instrumentation And Source Selection

Measure existing FreeRDP update callbacks separately from frame publishing.

Deliver:

  • update callback rate
  • dirty region dimensions
  • framebuffer copy time
  • binary send time
  • client render time
  • first-click trace without RAP focus interference

Status: completed and live-smoke-proven on the test Docker environment as of 2026-04-26.

Smoke command:

pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 `
  -PreferDirectDataPlane:$true `
  -AllowInsecureDirectDataPlaneTlsForSmoke:$true `
  -DirectDataPlaneConnectTimeoutMs 2500 `
  -DirectDataPlaneColorMode full_color `
  -SkipOrgSwitchAndTokenRefresh

Smoke evidence:

  • worker image: rap-rdp-worker:rdp-perf2-instrumented
  • session id: 1328b0dd-c5f9-4b15-b2ca-6d196ead5823
  • direct data plane selected by the Windows client
  • login/resource/start/input/detach/attach/takeover/taken_over/logout passed
  • one RDP runtime was created for the session
  • artifacts:
    • artifacts/rdp-perf2-worker-final.log
    • artifacts/rdp-perf2-client-final.log
    • artifacts/rdp-perf2-report.md

Measured callback sources:

Source Count / behavior
BeginPaint observed
EndPaint observed
BitmapUpdate observed and produced dirty region information
RefreshRect not observed in smoke
SurfaceBits not observed in smoke
SurfaceFrameMarker not observed in smoke
SurfaceFrameBits not observed in smoke
pointer callbacks not observed in smoke

Measured conclusions:

  • The RDP server/FreeRDP path does emit server-origin graphics callbacks in stable GDI mode.
  • Idle or server-origin screen changes can be detected without relying on local mouse/keyboard activity.
  • Full framebuffer copy time is not the main bottleneck in the measured smoke run.
  • The current render path duplicates work by capturing around both BitmapUpdate and EndPaint.
  • EndPaint should become a flush/safety marker rather than a second normal capture producer.
  • RDP-Perf-3 should make BitmapUpdate dirty regions the default normal render path and reserve full frames for connect/resize/attach/recovery.

RDP-Perf-3: Region-First BGRA Fallback

Use true dirty regions as the default fallback path.

Deliver:

  • no full-frame copy for small dirty updates
  • baseline full frame only on connect/resize/attach
  • region payloads only for normal UI changes

Status: completed and live-smoke-proven on the test Docker environment as of 2026-04-26.

Smoke evidence:

  • worker image: rap-rdp-worker:rdp-perf3-region-first
  • direct smoke session id: abc11233-34c4-45a6-a55b-0571a09332a1
  • fallback smoke session id: ee756839-6a82-49d4-9619-54acf69e1efd
  • direct worker WSS selected and backend gateway fallback separately verified
  • login/resource/start/input/detach/attach/takeover/taken_over/logout passed in both direct and fallback smoke
  • direct session cleanup state: terminated
  • fallback session cleanup state: terminated
  • report: artifacts/rdp-perf3-report.md

Measured direct-path results:

Metric Result
new RDP runtime count 1
direct data-plane binds 6
worker input apply events 6
deferred BitmapUpdate callbacks 104
bitmap_update_flush captures 104
region flush captures 93
full flush captures 11
periodic duplicate changes 0
client rendered region frames 19
client skipped region frames 0

Implementation notes:

  • BitmapUpdate is now deferred during a paint cycle.
  • EndPaint flushes the accumulated BitmapUpdate dirty region once.
  • EndPaint no longer performs a second normal change-detect capture when a bitmap update was already flushed.
  • The periodic change detector snapshot is synchronized after callback-driven frame capture, avoiding rediscovery of the same changed pixels.
  • Direct binary frame metadata now preserves full desktop dimensions separately from region payload dimensions, so the Windows client can patch regions into its framebuffer.
  • Backend gateway fallback remains compatible with the existing JSON/base64 path.

RDP-Perf-4: RDPGFX Channel Foundation

Capture and parse RDPGFX surface updates where available.

Deliver:

  • surface lifecycle
  • surface bits updates
  • cursor updates
  • fallback to region BGRA when RDPGFX unavailable

Status: build-proven and default-path smoke-proven on the test Docker environment as of 2026-04-26.

Implementation:

  • RDPGFX stays disabled by default.
  • RDP_WORKER_RDPGFX_ENABLED=true is the only gated runtime switch.
  • Worker diagnostics now log RDPGFX configuration, channel subscription, channel connection, GDI graphics pipeline initialization, fallback reasons, and normalized FreeRDP surface update callbacks.
  • Callback summaries include RDPGFX counters.
  • The default classic GDI region-first path remains the active safe path.

Default smoke evidence:

  • worker image: rap-rdp-worker:rdp-perf4-rdpgfx-gated
  • final default smoke session id: 30e80d99-e3b5-428b-aa18-fea65b8db499
  • direct worker WSS selected
  • login/resource/start/input/detach/attach/takeover/taken_over/logout passed
  • session cleanup state: terminated
  • worker log: rdp.gfx config requested=false mode=classic_gdi_region_first
  • worker log: rdp.perf callback_summary ... rdpgfx_requested=false ... frame_capture_region=...

Gated RDPGFX target compatibility result:

  • gated session id: aa69f606-9217-4579-b438-b7d3ec5e01d0
  • environment: RDP_WORKER_RDPGFX_ENABLED=true
  • result: failed on the current live RDP target
  • observed: BIO_read returned a system error 104: Connection reset by peer
  • observed: freerdp_post_connect failed
  • no rdp.gfx channel_connected or surface callbacks were observed before reset
  • conclusion: the current target must use the default GDI region-first path

Report: artifacts/rdp-perf4-report.md

RDP-Perf-5: Encoded Graphics Payloads

Support encoded graphics payloads over RAP direct data-plane.

Deliver:

  • binary encoded payload message type
  • client decode strategy
  • fallback to region BGRA

RDP-A4: CursorAdapter

Move cursor handling into the RDP Adapter boundary and keep cursor events independent from display frame cadence.

Status: completed and live-smoke-proven on the test Docker environment as of 2026-04-26.

Implementation:

  • CursorAdapter normalizes FreeRDP pointer callbacks into cursor position, visibility, shape, cache, and mask metadata.
  • FreeRDP pointer callbacks are installed and restored inside the RDP runtime hook boundary.
  • Original FreeRDP pointer callbacks are invoked before platform normalization, preserving FreeRDP internal state.
  • session_cursor_updated worker events are mapped to platform cursor.update envelopes.
  • Direct worker WSS treats cursor as latest-only/droppable and schedules it separately from binary render frames.
  • Backend gateway fallback remains compatible with the same session_cursor_updated event payload.
  • Windows client accepts cursor.update through the existing render payload bridge without changing UI layout.

Smoke evidence:

  • worker image: rap-rdp-worker:rdp-a4-cursor-adapter
  • direct smoke session id: 549806aa-c9db-48a9-917e-cf817cf236b5
  • fallback smoke session id: dee3a856-bee1-4eba-9c10-f62edaf56547
  • direct worker WSS selected in direct smoke
  • backend gateway selected in fallback smoke
  • login/resource/start/input/detach/attach/takeover/taken_over/logout passed in both direct and fallback smoke
  • direct session cleanup state: terminated
  • fallback session cleanup state: terminated
  • worker log: cursor.adapter hooks installed pointer_callbacks=true
  • worker log: adapter_event channel=cursor type=cursor.update origin=adapter
  • worker log: rdp.perf callback_summary ... cursor_updates_enqueued=...
  • client log: SessionWindowViewModel.HandleEnvelopeAsync ... cursor.update
  • report: artifacts/rdp-a4-cursor-adapter-report.md

Known limitation:

  • Cursor event separation does not by itself fix delayed hover/menu repaint. The next safe step is a GDI repaint cadence and server-origin update audit on the stable region-first path.

RDP-Perf-5A: GDI Repaint Cadence And Hover Feedback Hardening

Fix the first proven stable-path repaint cadence bottlenecks without changing backend, session lifecycle, data-plane contracts, clipboard/file transfer, or UI layout.

Status: build-proven and smoke-proven on the test Docker environment as of 2026-04-26.

Implementation:

  • FreeRDP event pump performs a bounded immediate drain after a signaled handle check so already-queued server events are not delayed by the next wait cycle.
  • Periodic no-change detection logging is rate-limited to avoid hot-loop log pressure while the remote screen is idle.
  • Worker session runtime renews the worker lease every 5 seconds instead of performing Redis lease I/O on every render/input loop iteration.
  • Region and interactive render notifications use a 33 ms publish interval.
  • Full-frame fallback remains at 100 ms.
  • Direct worker WSS binary writer uses the same 33 ms interval for region/interactive frames.

Smoke evidence:

  • worker image: rap-rdp-worker:rdp-perf5a-repaint-cadence
  • direct smoke session id: 0cca4974-2a82-48dc-a0f6-1036ea8e98f0
  • fallback smoke session id: 16deb09e-1c44-4e9d-8448-93b42ac66ed0
  • direct worker WSS selected in direct smoke
  • backend gateway selected in fallback smoke
  • login/resource/start/input/detach/attach/takeover/taken_over/logout passed in both direct and fallback smoke
  • direct session cleanup state: terminated
  • fallback session cleanup state: terminated
  • report: artifacts/rdp-perf5a-report.md

Measured direct-path results:

Metric Result
client rendered frames observed 65
client binary frames observed 66
direct region publishes at 33 ms 54
direct outbound FPS max 9.705640
render seen FPS max 26.386542
render published FPS max 9.459327
direct backpressure frame drops 0
render pending max 0

Measured conclusion:

  • Region/interactive frames now leave the worker promptly when server-origin changes arrive.
  • The direct smoke did not show queued FreeRDP event-handle bursts after the new immediate drain path: event_pump_drained_checks=0.
  • The current live target still emits idle/server-origin region changes at roughly 1 FPS in observed stable GDI mode.
  • Manual UX validation is still required before claiming hover/menu responsiveness accepted by a human operator.

RDP-Perf-6: Dirty-Region Direct Binary Render Contract

Replace full-frame-only direct binary render updates with explicit dirty-region direct binary render updates while preserving full-frame fallback.

Deliver:

  • direct RAP2 message_type=render.frame.full
  • direct RAP2 message_type=render.frame.region
  • one bounding-rectangle dirty-region BGRA payload for normal UI changes
  • full-frame fallback for first frame, attach/reattach, resize, recovery, invalid region state, and debug/fallback mode
  • worker diagnostics for full_frame_sent, region_frame_sent, region_bytes, full_frame_bytes, region_savings_percent, diff_time_ms, render_update_reason, and fallback_to_full_frame_reason
  • Windows direct receiver support for explicit full/region message types
  • Windows framebuffer-backed region patching
  • backend gateway JSON/base64 fallback unchanged

Status: implemented and build/probe/live-smoke-proven on the test Docker environment as of 2026-04-26 using the current RDP target.

Build/probe evidence:

  • worker image build: rap-rdp-worker:rdp-perf6-dirty-region
  • Windows client build: PASS
  • worker graphics adapter probe: PASS
  • worker direct data-plane bind valid probe: PASS
  • worker service adapter protocol probe: PASS
  • direct worker WSS smoke: PASS
  • backend gateway fallback smoke: PASS

Implementation notes:

  • The current classic GDI region-first display path remains the source of dirty-region payloads.
  • The direct worker WSS sender no longer labels all binary render payloads as session.frame; it uses render.frame.full and render.frame.region.
  • The Windows transport still normalizes direct render frames into the existing application-level session.frame pipeline, so session lifecycle, input, clipboard, and file-transfer behavior are unchanged.
  • The Windows presenter keeps a session framebuffer and applies region patches into it before presenting the updated surface.
  • Backend gateway fallback remains JSON/base64 and is not used as the production high-rate render relay.
  • Runtime payload examples: full baseline 3,686,400 bytes; dirty regions 16,384, 163,840, 327,680, 655,360, and 737,280 bytes.

RDP-Perf-7: Adaptive Quality Controller

Add channel-aware adaptive render quality.

Deliver:

  • latency-aware profile switching
  • bandwidth-aware profile switching
  • latest-only render backpressure
  • stable input under load

Acceptance Targets

LAN targets:

  • first frame: under 2 seconds after successful RDP login
  • click to visible response: under 150 ms for common UI
  • keypress to visible response: under 150 ms for text input
  • pointer hover response: under 100 ms where the target emits hover changes
  • one click activates remote buttons correctly
  • no unbounded frame/input queues

Weak-channel targets:

  • input remains usable even when render quality degrades
  • render drops stale updates instead of building backlog
  • file transfer never starves interactive input

RDP Performance Work Paused

RDP performance work is paused. Next active work is Fabric Core / cluster foundation.

RDP-Perf-6 remains accepted and smoke-proven. Future RDP roadmap items such as RDP-Perf-7, adaptive quality, encoded payloads, additional RDPGFX testing, tiles, codecs, or further renderer optimization must not start without a new explicit RDP-stage prompt.

The preserved RDP baseline remains:

  • C++ RDP Adapter runtime
  • direct worker WSS
  • backend gateway fallback
  • dirty-region direct binary render from RDP-Perf-6
  • proven session lifecycle
  • existing clipboard and file-transfer semantics