22 KiB
RDP Service C++ Performance Target
Status
This is the paused RDP service performance direction. The implementation name is RDP Adapter: a concrete Service Adapter that translates Microsoft RDP into the platform session/data-plane protocol. The common adapter contract is defined in docs/architecture/SERVICE_ADAPTER_PROTOCOL.md; the RDP-specific runtime plan is defined in docs/architecture/RDP_ADAPTER_RUNTIME.md.
Current implementation status:
- RDP-A1 / RDP-Perf-1 is build-proven.
- RDP-A2 adapter boundary is live-smoke-proven on the test Docker environment as of 2026-04-26: runtime code now goes through
RdpAdapterRuntime. - RDP-Perf-2 runtime instrumentation is build-proven and live-smoke-proven on the test Docker environment as of 2026-04-26.
- RDP-Perf-3 region-first BGRA fallback is build-proven and live-smoke-proven on the test Docker environment as of 2026-04-26.
- RDP-Perf-4 gated RDPGFX foundation is build-proven and default-path smoke-proven on the test Docker environment as of 2026-04-26. The current live RDP target resets the connection when RDPGFX is advertised, so RDPGFX remains disabled by default.
- RDP-A4 CursorAdapter is build-proven and live-smoke-proven on the test Docker environment as of 2026-04-26. Cursor events now flow as latest-only adapter-origin
cursor.updateevents over direct worker WSS and remain compatible with backend gateway fallback. - RDP-Perf-5A GDI repaint cadence hardening is build-proven and smoke-proven on the test Docker environment as of 2026-04-26. Region/interactive frames now publish on a 33 ms cadence, hot-loop lease renewal was removed, and backend gateway fallback remains compatible.
- RDP-Perf-6 dirty-region direct binary contract is build/probe/live-smoke-proven on the test Docker environment as of 2026-04-26. Direct
RAP2render frames now distinguish full frames from dirty-region patches and carry payload savings diagnostics; observed runtime dirty regions reduced payloads from the3,686,400byte full frame to examples such as16,384,163,840, and655,360bytes. - Current accepted baseline is
rap-rdp-worker:rdp-p1-region-order2: dirty-region delivery is preserved in order throughSessionRuntime, worker direct WSS, Windows transport, and WPF presenter queues. Manual visual smoke accepted idle repaint, Start menu/hover, keyboard, mouse, and session close. - Remaining visual limitation is quality/performance rather than correctness: window drag behaves like older/slow-link RDP clients by moving a frame, and repaint after release is usable but not polished.
- FreeRDP remains the internal substrate behind the adapter boundary until region-first/event-driven replacement paths are live-proven.
- RDP performance work is paused by product decision. When RDP work explicitly resumes, the next RDP step should continue from the stable GDI region-first path unless an RDPGFX-compatible target is added for gated testing.
The C++ worker remains the primary RDP runtime. The goal is not to rewrite the worker in another language. The goal is to replace the slow parts of the RDP service internals while preserving the proven backend/session/cluster/data-plane contracts.
The C# RDP service skeleton is superseded as a runtime direction and must not be used for implementation unless explicitly re-approved later.
Current Problem
The current MVP proved the hard lifecycle behavior:
- connect
- active state
- detach without killing the remote session
- reattach
- takeover
- terminate
- clipboard text
- file upload to worker storage
- direct worker WSS data-plane
However, the render/input experience is not acceptable.
Root cause:
- the worker uses FreeRDP successfully for the RDP connection
- but the production render path still behaves like framebuffer capture
- the worker copies large BGRA buffers and publishes them as RAP frames
- input is fast enough in parts of the path, but visual feedback depends on slow snapshot/frame delivery
On a >1 Gbit LAN this should not be slow. The bottleneck is the RDP service render algorithm, not the network.
Non-Negotiable Boundaries
Do not change:
- backend control plane
- organization/session lifecycle
- PostgreSQL source of truth
- Redis live coordination model
- worker leases and assignment contracts
- data_plane_token contracts
- direct worker WSS transport
- backend gateway fallback
- clipboard/file-transfer policy semantics
Only the RDP service adapter internals may change.
Target Design
Keep the worker in C++.
Use C++ to own the RDP service internals:
- input adapter
- graphics adapter
- cursor adapter
- virtual channel adapters
- quality/adaptive controller
- render sink to existing RAP data-plane
FreeRDP may remain temporarily as a connection/security/channel substrate, but the target production render path must not be FreeRDP GDI framebuffer snapshots. If a FreeRDP layer blocks access to the needed RDP graphics primitives, replace that narrow layer with project-owned C++ code rather than rewriting the full service in another language.
High-Performance RDP Model
Fast RDP clients do not repeatedly send full desktop images. They use protocol updates:
- dirty rectangles
- surface commands
- cursor updates
- bitmap/cache updates
- RDPGFX dynamic virtual channel
- RemoteFX Progressive / ClearCodec / H.264 AVC420 / AVC444 / HEVC where negotiated
- adaptive graphics and quality selection
References:
- https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-rdpegfx/da5c75f9-cd99-450c-98c4-014a496942b0
- https://learn.microsoft.com/en-us/azure/virtual-desktop/graphics-encoding
- https://freerdp-freerdp.mintlify.app/concepts/codecs
New Internal Layers
flowchart LR
Target["Windows RDP Server"]
RdpCore["C++ RDP Core / FreeRDP Substrate"]
Graphics["Graphics Adapter"]
Input["Input Adapter"]
Channels["Virtual Channel Adapters"]
DataPlane["Existing Direct Worker WSS"]
Client["RAP Windows Client"]
Target <--> RdpCore
RdpCore --> Graphics
Input --> RdpCore
RdpCore <--> Channels
Graphics --> DataPlane
Channels --> DataPlane
DataPlane <--> Client
Graphics Adapter
The graphics adapter converts RDP graphics primitives into RAP render updates.
Supported update classes:
frame_full_bgraonly for baseline/debug/fallbackregion_bgrafor dirty regionssurface_createsurface_deletesurface_mapsurface_bitsencoded_framecursor_update
Rules:
- full-frame BGRA is fallback, not the target production path
- direct render remains binary
- backend gateway fallback may keep JSON/base64 compatibility
- stale render updates are droppable
- input never waits behind render
Input Adapter
Input stays separate from render.
Rules:
- keyboard down/up is reliable and ordered
- mouse button down/up and wheel are reliable and ordered
- mouse move is latest-only/coalesced
- button down must include or be preceded by pointer position
- no RAP focus message may consume the first remote click
- input must not trigger full-frame capture loops
Virtual Channel Adapters
Clipboard/file/drive redirection remain isolated:
- clipboard stays text-only until explicitly expanded
- restricted drive mapping remains policy-bound
- file upload/download policies stay enforced in the real data path
Weak Network Strategy
Weak-channel performance must degrade render before input.
Priority order:
- input
- control
- clipboard
- render key updates
- file transfer
- telemetry
Render adaptation:
- drop stale render updates
- prefer dirty regions over full frames
- reduce FPS before increasing input latency
- reduce color mode where useful
- use text-priority mode for office/admin workloads
- use encoded/compressed graphics payloads where negotiated
- never let file transfer or VPN-like bulk traffic starve RDP input/control
Quality profiles:
emergency_grayscalelow_bandwidthtext_prioritybalancedhigh_quality
Color modes:
- full color
- 256 colors
- 64 colors
- 16 colors
- grayscale
Migration Stages
RDP-A1 / RDP-Perf-1: Boundary And Audit
Create C++ graphics/input adapter boundaries and document the replacement path. Do not change runtime behavior yet.
Deliver:
- common
Service Adapterchannel contract - RDP Adapter runtime plan
graphics_adapterinterface- render update model
- compile-safe probe
- docs update
Status: completed and build-proven.
RDP-Perf-2: Runtime Instrumentation And Source Selection
Measure existing FreeRDP update callbacks separately from frame publishing.
Deliver:
- update callback rate
- dirty region dimensions
- framebuffer copy time
- binary send time
- client render time
- first-click trace without RAP focus interference
Status: completed and live-smoke-proven on the test Docker environment as of 2026-04-26.
Smoke command:
pwsh -ExecutionPolicy Bypass -File scripts/windows-smoke/desktop-smoke.ps1 `
-PreferDirectDataPlane:$true `
-AllowInsecureDirectDataPlaneTlsForSmoke:$true `
-DirectDataPlaneConnectTimeoutMs 2500 `
-DirectDataPlaneColorMode full_color `
-SkipOrgSwitchAndTokenRefresh
Smoke evidence:
- worker image:
rap-rdp-worker:rdp-perf2-instrumented - session id:
1328b0dd-c5f9-4b15-b2ca-6d196ead5823 - direct data plane selected by the Windows client
- login/resource/start/input/detach/attach/takeover/taken_over/logout passed
- one RDP runtime was created for the session
- artifacts:
artifacts/rdp-perf2-worker-final.logartifacts/rdp-perf2-client-final.logartifacts/rdp-perf2-report.md
Measured callback sources:
| Source | Count / behavior |
|---|---|
BeginPaint |
observed |
EndPaint |
observed |
BitmapUpdate |
observed and produced dirty region information |
RefreshRect |
not observed in smoke |
SurfaceBits |
not observed in smoke |
SurfaceFrameMarker |
not observed in smoke |
SurfaceFrameBits |
not observed in smoke |
| pointer callbacks | not observed in smoke |
Measured conclusions:
- The RDP server/FreeRDP path does emit server-origin graphics callbacks in stable GDI mode.
- Idle or server-origin screen changes can be detected without relying on local mouse/keyboard activity.
- Full framebuffer copy time is not the main bottleneck in the measured smoke run.
- The current render path duplicates work by capturing around both
BitmapUpdateandEndPaint. EndPaintshould become a flush/safety marker rather than a second normal capture producer.- RDP-Perf-3 should make
BitmapUpdatedirty regions the default normal render path and reserve full frames for connect/resize/attach/recovery.
RDP-Perf-3: Region-First BGRA Fallback
Use true dirty regions as the default fallback path.
Deliver:
- no full-frame copy for small dirty updates
- baseline full frame only on connect/resize/attach
- region payloads only for normal UI changes
Status: completed and live-smoke-proven on the test Docker environment as of 2026-04-26.
Smoke evidence:
- worker image:
rap-rdp-worker:rdp-perf3-region-first - direct smoke session id:
abc11233-34c4-45a6-a55b-0571a09332a1 - fallback smoke session id:
ee756839-6a82-49d4-9619-54acf69e1efd - direct worker WSS selected and backend gateway fallback separately verified
- login/resource/start/input/detach/attach/takeover/taken_over/logout passed in both direct and fallback smoke
- direct session cleanup state:
terminated - fallback session cleanup state:
terminated - report:
artifacts/rdp-perf3-report.md
Measured direct-path results:
| Metric | Result |
|---|---|
| new RDP runtime count | 1 |
| direct data-plane binds | 6 |
| worker input apply events | 6 |
deferred BitmapUpdate callbacks |
104 |
bitmap_update_flush captures |
104 |
| region flush captures | 93 |
| full flush captures | 11 |
| periodic duplicate changes | 0 |
| client rendered region frames | 19 |
| client skipped region frames | 0 |
Implementation notes:
BitmapUpdateis now deferred during a paint cycle.EndPaintflushes the accumulatedBitmapUpdatedirty region once.EndPaintno longer performs a second normal change-detect capture when a bitmap update was already flushed.- The periodic change detector snapshot is synchronized after callback-driven frame capture, avoiding rediscovery of the same changed pixels.
- Direct binary frame metadata now preserves full desktop dimensions separately from region payload dimensions, so the Windows client can patch regions into its framebuffer.
- Backend gateway fallback remains compatible with the existing JSON/base64 path.
RDP-Perf-4: RDPGFX Channel Foundation
Capture and parse RDPGFX surface updates where available.
Deliver:
- surface lifecycle
- surface bits updates
- cursor updates
- fallback to region BGRA when RDPGFX unavailable
Status: build-proven and default-path smoke-proven on the test Docker environment as of 2026-04-26.
Implementation:
- RDPGFX stays disabled by default.
RDP_WORKER_RDPGFX_ENABLED=trueis the only gated runtime switch.- Worker diagnostics now log RDPGFX configuration, channel subscription, channel connection, GDI graphics pipeline initialization, fallback reasons, and normalized FreeRDP surface update callbacks.
- Callback summaries include RDPGFX counters.
- The default classic GDI region-first path remains the active safe path.
Default smoke evidence:
- worker image:
rap-rdp-worker:rdp-perf4-rdpgfx-gated - final default smoke session id:
30e80d99-e3b5-428b-aa18-fea65b8db499 - direct worker WSS selected
- login/resource/start/input/detach/attach/takeover/taken_over/logout passed
- session cleanup state:
terminated - worker log:
rdp.gfx config requested=false mode=classic_gdi_region_first - worker log:
rdp.perf callback_summary ... rdpgfx_requested=false ... frame_capture_region=...
Gated RDPGFX target compatibility result:
- gated session id:
aa69f606-9217-4579-b438-b7d3ec5e01d0 - environment:
RDP_WORKER_RDPGFX_ENABLED=true - result: failed on the current live RDP target
- observed:
BIO_read returned a system error 104: Connection reset by peer - observed:
freerdp_post_connect failed - no
rdp.gfx channel_connectedor surface callbacks were observed before reset - conclusion: the current target must use the default GDI region-first path
Report: artifacts/rdp-perf4-report.md
RDP-Perf-5: Encoded Graphics Payloads
Support encoded graphics payloads over RAP direct data-plane.
Deliver:
- binary encoded payload message type
- client decode strategy
- fallback to region BGRA
RDP-A4: CursorAdapter
Move cursor handling into the RDP Adapter boundary and keep cursor events independent from display frame cadence.
Status: completed and live-smoke-proven on the test Docker environment as of 2026-04-26.
Implementation:
CursorAdapternormalizes FreeRDP pointer callbacks into cursor position, visibility, shape, cache, and mask metadata.- FreeRDP pointer callbacks are installed and restored inside the RDP runtime hook boundary.
- Original FreeRDP pointer callbacks are invoked before platform normalization, preserving FreeRDP internal state.
session_cursor_updatedworker events are mapped to platformcursor.updateenvelopes.- Direct worker WSS treats cursor as latest-only/droppable and schedules it separately from binary render frames.
- Backend gateway fallback remains compatible with the same
session_cursor_updatedevent payload. - Windows client accepts
cursor.updatethrough the existing render payload bridge without changing UI layout.
Smoke evidence:
- worker image:
rap-rdp-worker:rdp-a4-cursor-adapter - direct smoke session id:
549806aa-c9db-48a9-917e-cf817cf236b5 - fallback smoke session id:
dee3a856-bee1-4eba-9c10-f62edaf56547 - direct worker WSS selected in direct smoke
- backend gateway selected in fallback smoke
- login/resource/start/input/detach/attach/takeover/taken_over/logout passed in both direct and fallback smoke
- direct session cleanup state:
terminated - fallback session cleanup state:
terminated - worker log:
cursor.adapter hooks installed pointer_callbacks=true - worker log:
adapter_event channel=cursor type=cursor.update origin=adapter - worker log:
rdp.perf callback_summary ... cursor_updates_enqueued=... - client log:
SessionWindowViewModel.HandleEnvelopeAsync ... cursor.update - report:
artifacts/rdp-a4-cursor-adapter-report.md
Known limitation:
- Cursor event separation does not by itself fix delayed hover/menu repaint. The next safe step is a GDI repaint cadence and server-origin update audit on the stable region-first path.
RDP-Perf-5A: GDI Repaint Cadence And Hover Feedback Hardening
Fix the first proven stable-path repaint cadence bottlenecks without changing backend, session lifecycle, data-plane contracts, clipboard/file transfer, or UI layout.
Status: build-proven and smoke-proven on the test Docker environment as of 2026-04-26.
Implementation:
- FreeRDP event pump performs a bounded immediate drain after a signaled handle check so already-queued server events are not delayed by the next wait cycle.
- Periodic no-change detection logging is rate-limited to avoid hot-loop log pressure while the remote screen is idle.
- Worker session runtime renews the worker lease every 5 seconds instead of performing Redis lease I/O on every render/input loop iteration.
- Region and interactive render notifications use a 33 ms publish interval.
- Full-frame fallback remains at 100 ms.
- Direct worker WSS binary writer uses the same 33 ms interval for region/interactive frames.
Smoke evidence:
- worker image:
rap-rdp-worker:rdp-perf5a-repaint-cadence - direct smoke session id:
0cca4974-2a82-48dc-a0f6-1036ea8e98f0 - fallback smoke session id:
16deb09e-1c44-4e9d-8448-93b42ac66ed0 - direct worker WSS selected in direct smoke
- backend gateway selected in fallback smoke
- login/resource/start/input/detach/attach/takeover/taken_over/logout passed in both direct and fallback smoke
- direct session cleanup state:
terminated - fallback session cleanup state:
terminated - report:
artifacts/rdp-perf5a-report.md
Measured direct-path results:
| Metric | Result |
|---|---|
| client rendered frames observed | 65 |
| client binary frames observed | 66 |
| direct region publishes at 33 ms | 54 |
| direct outbound FPS max | 9.705640 |
| render seen FPS max | 26.386542 |
| render published FPS max | 9.459327 |
| direct backpressure frame drops | 0 |
| render pending max | 0 |
Measured conclusion:
- Region/interactive frames now leave the worker promptly when server-origin changes arrive.
- The direct smoke did not show queued FreeRDP event-handle bursts after the new
immediate drain path:
event_pump_drained_checks=0. - The current live target still emits idle/server-origin region changes at roughly 1 FPS in observed stable GDI mode.
- Manual UX validation is still required before claiming hover/menu responsiveness accepted by a human operator.
RDP-Perf-6: Dirty-Region Direct Binary Render Contract
Replace full-frame-only direct binary render updates with explicit dirty-region direct binary render updates while preserving full-frame fallback.
Deliver:
- direct
RAP2message_type=render.frame.full - direct
RAP2message_type=render.frame.region - one bounding-rectangle dirty-region BGRA payload for normal UI changes
- full-frame fallback for first frame, attach/reattach, resize, recovery, invalid region state, and debug/fallback mode
- worker diagnostics for
full_frame_sent,region_frame_sent,region_bytes,full_frame_bytes,region_savings_percent,diff_time_ms,render_update_reason, andfallback_to_full_frame_reason - Windows direct receiver support for explicit full/region message types
- Windows framebuffer-backed region patching
- backend gateway JSON/base64 fallback unchanged
Status: implemented and build/probe/live-smoke-proven on the test Docker environment as of 2026-04-26 using the current RDP target.
Build/probe evidence:
- worker image build:
rap-rdp-worker:rdp-perf6-dirty-region - Windows client build: PASS
- worker graphics adapter probe: PASS
- worker direct data-plane bind valid probe: PASS
- worker service adapter protocol probe: PASS
- direct worker WSS smoke: PASS
- backend gateway fallback smoke: PASS
Implementation notes:
- The current classic GDI region-first display path remains the source of dirty-region payloads.
- The direct worker WSS sender no longer labels all binary render payloads as
session.frame; it usesrender.frame.fullandrender.frame.region. - The Windows transport still normalizes direct render frames into the existing
application-level
session.framepipeline, so session lifecycle, input, clipboard, and file-transfer behavior are unchanged. - The Windows presenter keeps a session framebuffer and applies region patches into it before presenting the updated surface.
- Backend gateway fallback remains JSON/base64 and is not used as the production high-rate render relay.
- Runtime payload examples: full baseline
3,686,400bytes; dirty regions16,384,163,840,327,680,655,360, and737,280bytes.
RDP-Perf-7: Adaptive Quality Controller
Add channel-aware adaptive render quality.
Deliver:
- latency-aware profile switching
- bandwidth-aware profile switching
- latest-only render backpressure
- stable input under load
Acceptance Targets
LAN targets:
- first frame: under 2 seconds after successful RDP login
- click to visible response: under 150 ms for common UI
- keypress to visible response: under 150 ms for text input
- pointer hover response: under 100 ms where the target emits hover changes
- one click activates remote buttons correctly
- no unbounded frame/input queues
Weak-channel targets:
- input remains usable even when render quality degrades
- render drops stale updates instead of building backlog
- file transfer never starves interactive input
RDP Performance Work Paused
RDP performance work is paused. Next active work is Fabric Core / cluster foundation.
RDP-Perf-6 remains accepted and smoke-proven. Future RDP roadmap items such as RDP-Perf-7, adaptive quality, encoded payloads, additional RDPGFX testing, tiles, codecs, or further renderer optimization must not start without a new explicit RDP-stage prompt.
The preserved RDP baseline remains:
- C++ RDP Adapter runtime
- direct worker WSS
- backend gateway fallback
- dirty-region direct binary render from RDP-Perf-6
- proven session lifecycle
- existing clipboard and file-transfer semantics