25 KiB
Project Audit And Next-Step Plan
Date: 2026-04-26
Status: documentation/audit only. No runtime behavior is changed by this document.
1. Executive Summary
The project is no longer just an RDP proxy. The correct target is a Secure Access Fabric platform with a control plane, direct realtime data plane, service adapters, tenant isolation, and future node/mesh/VPN capabilities.
The implementation has reached a much more advanced state than several operational documents describe. The most important current risk is therefore not only code quality. It is source-of-truth drift: old prompts and READMEs can send the next stage in the wrong direction.
The RDP MVP has proven the hard lifecycle assumptions:
- real RDP connection through the worker works
- active/detach/reattach/takeover/terminate flows are proven
- takeover does not recreate the remote session
- worker-death/orphan-active-session recovery is proven
- Windows client can render and control a real remote desktop
- direct worker WSS data plane is implemented and used
- binary render frames are implemented on direct data plane
- backend gateway JSON/base64 path remains available as fallback/debug
- ordered dirty-region delivery is accepted as the current RDP baseline
- text clipboard is implemented and accepted
- client-to-server file upload to worker-controlled storage is accepted
- restricted drive visibility is runtime-proven: uploaded files are visible and
openable inside the remote Windows session through
RAP_Transfers
The RDP adapter lesson is clear: "make it simple first and patch later" is dangerous for realtime protocols. Full-frame polling, implicit refresh after input, and backend/Redis realtime relaying worked for proof, but they caused the exact class of latency and correctness issues we later had to unwind. From this point forward, each service adapter must be specified as an event-driven adapter before implementation.
Recommended immediate priority:
- Freeze and document the current working baseline.
- Synchronize stale project docs with the real state.
- Preserve the accepted RDP visual correctness/stability baseline.
- Preserve the accepted Stage 5.1.1 restricted drive visibility behavior.
- Add automated regression gates so manual discoveries become repeatable tests.
2. Audit Method
This audit used the current filesystem state in:
\\192.168.220.200\mst\codex\rdp-proxy
Important environment note:
- the directory is not currently a Git checkout (
git statusreports that no.gitrepository exists), so this audit cannot use commit history - the canonical test Docker host is
docker-test/192.168.200.61 - the live test stack currently contains
rap_backend_smoke,rap_worker_smoke,rap_postgres, andrap_redis
Commands run during this audit:
go test ./...
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-graphics-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-cursor-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-service-adapter-protocol-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-dataplane-bind-probe --scenario valid
Results:
- backend tests: PASS
- Windows client build: PASS, 0 warnings, 0 errors
- worker graphics adapter probe: PASS
- worker cursor adapter probe: PASS
- worker service adapter protocol probe: PASS
- worker data-plane bind valid probe: PASS
Coverage warning:
- most backend modules still report
[no test files] - much of the current confidence comes from smoke/manual proofs and logs
- this is not enough for production readiness
3. Planned Direction
The authoritative long-term direction is:
CODEX_CONTEXT.mddocs/architecture/SECURE_ACCESS_FABRIC_TARGET.mddocs/architecture/DATA_PLANE_V1.mddocs/architecture/SERVICE_ADAPTER_PROTOCOL.mddocs/architecture/RDP_ADAPTER_RUNTIME.mddocs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md
The target platform model is:
Access Client
-> Ingress / Data Plane
-> Secure Fabric / Routing
-> Service Adapter at egress edge
-> Target service
For RDP specifically:
Access Client
<-> platform session/data-plane protocol
RDP Adapter
<-> FreeRDP / project-owned RDP internals
RDP Server
This naming should be kept consistent:
- Access Client: native Windows/iOS/Android/Linux client that speaks the platform protocol
- Control Plane: backend API, auth, orgs, policy, session lifecycle, audit
- Data Plane: realtime session traffic channels
- Service Adapter: protocol translator for RDP/VNC/SSH/video/etc
- RDP Adapter: current C++ RDP service adapter
- Entry/Ingress Node: accepts client connections into the fabric
- Egress/Service Node: reaches target resources and hosts adapters
- Node Agent: native host identity, update, health, and service supervisor
4. What Is Implemented
Backend
Implemented:
- Go backend foundation
- PostgreSQL source-of-truth storage
- Redis live coordination/routing
- auth foundation
- refresh token rotation
- devices/trusted devices
- org-scoped resources and sessions
- platform-core v2 foundation
- identity source foundation
- node/node-agent control-plane foundation
- session broker orchestration
- worker coordination and stale worker monitoring
- structured localization-ready messages
- resource certificate verification policy
- clipboard policy
- file-transfer policy
- data-plane token/candidate generation
- backend gateway fallback
Key files:
backend/internal/modules/sessionbroker/service.gobackend/internal/modules/sessionbroker/orchestration.gobackend/internal/modules/sessionbroker/state_machine.gobackend/internal/modules/sessionbroker/dataplane.gobackend/internal/modules/sessiongateway/module.gobackend/internal/modules/worker/monitor.gobackend/internal/modules/resource/module.gobackend/internal/modules/auth/service.gobackend/internal/platform/httpx/message.gobackend/migrations/000005_platform_core_v2.up.sqlbackend/migrations/000007_clipboard_policy_mode.up.sqlbackend/migrations/000008_file_transfer_policy_mode.up.sql
Known backend gaps:
- automated test coverage is thin outside
sessionbroker - P3/P3.1 resource secret-readiness and encrypted resolver MVP exists;
production mode rejects plaintext credential metadata and requires
secret_reffor RDP/VNC/SSH resources - external KMS/Vault integration and master-key rotation are not implemented yet
- admin/control UI for safe resource/policy management is not the current focus
- node-agent runtime is not implemented; only control-plane foundation exists
- identity source sync runtime is not implemented
Windows Client
Implemented:
- WPF client skeleton and build
- auth/login/refresh/logout foundation
- organization selection
- resource list
- active sessions
- session window
- direct data-plane selection with fallback
- binary render receive path
- input capture/forwarding
- cursor/render display
- localization-ready resource layer
- text clipboard UI/path
- file upload UI/path
- failed-session refresh after gateway close
Key files:
clients/windows/src/RemoteAccessPlatform.Windows.App/SessionWindow.xamlclients/windows/src/RemoteAccessPlatform.Windows.Application/ViewModels/SessionWindowViewModel.csclients/windows/src/RemoteAccessPlatform.Windows.Transport/SessionGatewayClient.csclients/windows/src/RemoteAccessPlatform.Windows.App/Input/SessionInputMapper.csclients/windows/src/RemoteAccessPlatform.Windows.Application/Localization/Strings.csclients/windows/src/RemoteAccessPlatform.Windows.Application/Resources/Strings.resx
Known client gaps:
- final UX polish is not complete
- automated client regression tests are missing
- manual RDP UX remains the acceptance authority for now
- some README limitations are stale and understate what exists
RDP Worker / RDP Adapter
Implemented:
- standalone C++ worker service
- FreeRDP integration behind worker boundary
- worker registration/assignment/lease lifecycle
- direct worker WSS endpoint
- RS256 data-plane token validation
- direct bind policy and current attachment validation
- JSON control/input/clipboard/file-upload envelopes
- binary RAP2 render frames for direct path
- backend gateway JSON/base64 fallback
- region-first BGRA render path
- direct attach baseline full-frame repair
- region-loss full-frame repair throttle
- cursor adapter boundary
- text clipboard through FreeRDP
cliprdr - client-to-server file upload
- restricted visible transfer directory
- restricted FreeRDP drive redirection groundwork
Key files:
workers/rdp-worker/src/main.cppworkers/rdp-worker/src/runtime/session_runtime.cppworkers/rdp-worker/include/rdp_worker/runtime/session_runtime.hppworkers/rdp-worker/src/adapter/rdp_adapter_runtime.cppworkers/rdp-worker/src/freerdp/rdp_runtime.cppworkers/rdp-worker/src/dataplane/direct_wss_server.cppworkers/rdp-worker/src/runtime/direct_bind_policy.cppworkers/rdp-worker/include/rdp_worker/adapter/service_adapter_protocol.hpp
Current live/smoke images:
rap-backend-smoke:stage5-2-download
rap-rdp-worker:stage5-2-download
Known worker/RDP gaps:
- drag/release repaint is usable but not polished; drag behaves like an older RDP client on a weak link by moving a frame rather than continuously repainting the full window
- RDPGFX is gated and disabled by default because the current live target resets the connection when RDPGFX is advertised
- encoded graphics/codecs/tiles are not production-accepted yet
- file download core data path is runtime-proven through direct worker WSS and backend gateway fallback, and lifecycle blocking is runtime-proven for detach, old-controller takeover, and worker failure. Stage 5.2 is not fully runtime-accepted until Windows desktop UI download is proven
- FreeRDP is still the substrate; replacing it is not justified until the adapter boundary proves which pieces are actually insufficient
5. Plan vs Fact Matrix
| Area | Planned | Current fact | Status |
|---|---|---|---|
| Backend foundation | Go, config, HTTP, PostgreSQL, Redis | Implemented and builds | Done |
| Auth | access/refresh flow, sessions, devices | Implemented | Done |
| Session lifecycle | start/attach/detach/takeover/terminate/fail/recover | Live-proven earlier and preserved | Done, protect |
| Multi-tenancy | organizations and org-scoped resources/sessions | Implemented | Done, needs more tests |
| Authorization | platform/admin/member boundaries | Implemented foundation | Needs broader tests |
| Worker coordination | registration, lease, stale recovery | Implemented and live-proven | Done, protect |
| Windows client MVP | native WPF client | Implemented and builds | Done |
| Localization messaging | structured backend/client messaging | Implemented and runtime-proven earlier | Done, protect |
| Direct data plane | client-to-worker WSS | Implemented | Done |
| Binary render | direct binary render, fallback JSON/base64 | Implemented | Done |
| RDP adapter event model | event-driven adapter boundary | Implemented and P1 accepted | Done, protect |
| RDP render quality | grayscale foundation | Implemented | Partial |
| RDPGFX/encoded graphics | future performance path | gated only, not accepted | Not production |
| Clipboard | text-only, policy-gated | Accepted | Done |
| File upload | client-to-server to worker storage | Accepted | Done |
| File visibility in RDP | restricted drive redirection | Accepted via RAP_Transfers |
Done, protect |
| File download | server-to-client | Core and lifecycle runtime-proven, desktop UI proof pending | Prove UI next |
| Mesh/VPN/multi-cluster runtime | target architecture only | Not implemented | Correctly deferred |
| Node-agent runtime/updater | target/foundation only | Not implemented | Future |
| Identity sync runtime | LDAP/OIDC sync | Not implemented | Future |
6. Important Source-Of-Truth Drift
At the start of this audit these files were stale or partly stale:
README.mdstill points to old compat-era docs and says not to start with UI, while the Windows client already existsdocs/codex/CURRENT_STATUS.mdsays WebSocket takeover proof is still a gap, even though that proof was later closeddocs/codex/NEXT_STEP_PROMPT.mdpreviously pointed to platform-core v2 as the next step, although platform-core v2 already existsclients/windows/README.mdstill says it intentionally stops short of final viewer rendering, but the client now renders the remote desktopworkers/rdp-worker/README.mddocumented recent RDP stages, but previously did not clearly mark the current accepted image and latest manual acceptancedocs/architecture/DATA_PLANE_V1.mdpreviously had a stale "Next Implementation Prompt"; it now points to Stage 5.2 live runtime proofdocs/architecture/RDP_ADAPTER_RUNTIME.mdanddocs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.mdstill mark manual UX acceptance as pending before the latest fixes
This was the P0 risk addressed by the baseline-freeze documentation pass. Future stages must keep these files current after every accepted runtime change so a future Codex/session cannot follow an old prompt and reintroduce already-rejected architecture.
7. Lessons From The RDP Adapter Work
The RDP work exposed several project-level rules:
-
Realtime protocol features must be designed as channel semantics first. Input, display, cursor, clipboard, file transfer, and telemetry cannot share one undifferentiated queue.
-
Backend/Redis must not be the production realtime path. It is correct as fallback/debug/control-plane glue, not for high-rate render.
-
Full-frame rendering is not the normal production model. It is needed for baseline, attach, resize, recovery, and fallback repair.
-
Dirty regions cannot be blindly latest-only without a repair strategy. Dropping a region update may leave visible artifacts; the current
region_loss_repairfull-frame repair is a pragmatic safety net. -
Server-origin events must drive display updates. Remote changes must not depend on local mouse/keyboard events.
-
Input must be independent from render. A key or click must never wait behind a frame, upload chunk, clipboard message, or lease renewal.
-
FreeRDP is not the problem by default. The earlier problem was how we pumped events, scheduled frames, relayed payloads, and treated screen updates. The correct direction is an adapter boundary around FreeRDP first, not a full rewrite before we can prove the replacement.
-
Manual UX proof is essential. Automated input can pass while real user input feels wrong.
-
Every "temporary" shortcut needs an explicit expiration condition. If it does not have one, it becomes architecture.
8. What We May Have Missed
These are not immediate bugs, but they should be addressed early because they shape the product:
- RDP server compatibility matrix: Windows Server versions, NLA modes, GDI vs RDPGFX behavior, color depth, TLS/cert behavior, domain login variants
- weak-channel simulation: latency, jitter, loss, constrained bandwidth
- high-concurrency session model: many users, many workers, CPU/network limits
- deterministic smoke reports: every accepted stage should leave reproducible artifacts and commands
- secret management: credentials must move out of plain resource metadata
- production PKI: direct worker WSS currently uses smoke-friendly TLS handling on the client side
- authorization tests: cross-org denial paths need automated coverage
- resource policy test matrix: clipboard/file/cert/session policies
- file transfer threat model: filename normalization, symlink escape, overwrite behavior, quotas, cleanup, audit
- observability: per-channel latency, frame drops, input latency, worker event pump health, adapter callback counters
- client UI state machine tests: close/dispose, failed state, reconnect, takeover, detach, old attachment blocking
- upgrade/rollback story: node-agent target exists, runtime is not implemented
- deployment topology: container host networking vs Docker bridge/NAT for realtime workloads
- service adapter conformance suite: RDP now has a pattern that VNC/SSH/video should follow
9. Architectural Decisions To Freeze Now
These decisions should be treated as current project rules:
- PostgreSQL is source of truth.
- Redis is live coordination/routing only.
- Backend is control plane, not production render relay.
- Direct data plane is preferred for realtime RDP traffic.
- Backend gateway remains fallback/debug until direct path is fully mature.
- Service adapters translate external protocols to platform channels.
- RDP Adapter remains C++ and FreeRDP-backed for now.
- FreeRDP details must not leak into backend or Access Client business logic.
- Access Client speaks platform protocol, not RDP.
- Mesh/VPN/multi-cluster/node-agent runtime remain future staged work.
- RDP must be stabilized before adding VNC/SSH/VPN/product expansion.
- No new feature should start while source-of-truth docs are stale.
10. Recommended Next Stages
P0. Truth And Baseline Freeze
Goal: make the current working system impossible to misunderstand.
Do:
- update root
README.md - update
docs/codex/CURRENT_STATUS.md - update
docs/codex/NEXT_STEP_PROMPT.md - update
clients/windows/README.md - update
workers/rdp-worker/README.md - update
docs/architecture/DATA_PLANE_V1.mdnext prompt - update
docs/architecture/RDP_ADAPTER_RUNTIME.mdwith latest baseline/region repair status - document current test Docker image/tag and startup commands
- preserve the accepted RDP worker baseline
- create one "current smoke matrix" document
Do not:
- add features
- start DP-3B
- start server-to-client download
- start mesh/VPN/node-agent runtime
Acceptance:
- a new engineer/Codex can read the docs and know the actual next step
- no doc points to archived v1 or already-completed stages as next work
P1. RDP Visual Correctness Hardening
Goal: eliminate remaining small artifacts without returning to slow full-frame rendering.
Do:
- add explicit region sequence/gap diagnostics
- prove when artifacts happen: region drop, stale region ordering, missed server callback, client application bug, or repair interval issue
- verify client applies region frames to the correct bitmap area and stride
- keep baseline full frame on attach
- keep full repair only on loss/recovery, not as normal render loop
- collect before/after screenshots/logs
Do not:
- enable RDPGFX globally
- add compression/tiles/codecs before correctness is stable
- change backend/session lifecycle
Acceptance:
- remote idle updates repaint without local input
- Start menu/task manager/window movement leave no persistent artifacts
- input and close behavior remain usable
P2. Stage 5.1.1 Restricted Drive Visibility Proof
Status: accepted as runtime-proven on the test Docker stand.
Goal: keep the upload visibility path protected while the RDP Adapter continues to be hardened.
Do:
- run live smoke with current RDP adapter baseline
- upload file from Windows client
- verify file appears in
\\tsclient\RAP_Transfers - open text and binary files inside the remote Windows session
- prove disabled policy blocks upload
- prove takeover/detach/failure block old or invalid upload
- verify directory cleanup on terminate
Do not:
- implement download
- expose arbitrary worker filesystem
- implement shared folders or SMB/WebDAV
Accepted proof:
- uploaded file is visible and openable inside remote Windows
- only per-session visible directory is exposed
- worker logs show
RAP_Transfersconfigured as the only redirected drive - termination cleans the per-session transfer directory
P3. Security And Secrets Readiness
Status: P3.1 MVP complete; production TLS/PKI remains P3.2.
Goal: remove proof-stage security shortcuts before broad usage.
Completed:
- documented secret-reference model in
docs/architecture/SECURITY_SECRETS_READINESS.md - production mode rejects plaintext credential-like resource metadata
- production RDP/VNC/SSH resources require
secret_ref - session start rejects compat plaintext resources in production mode
- data-plane allowed-channel policy test exists
- worker direct-bind denial probes cover wrong worker/user/org/resource, wrong attachment, over-broad channels, and failed/terminated states
- encrypted PostgreSQL-backed
resource_secretsstore exists - resource secret create/rotate endpoint updates
resources.secret_refwithout returning plaintext - session assignment resolves
secret_refafter organization/resource/session/ worker/lease checks and does not mutateremote_sessions.metadatawith plaintext - secret access/access-denied/rotation audit events exist
- direct worker WSS TLS trust metadata/guard exists; production backend omits smoke-only direct candidates and production Windows client skips untrusted direct candidates
Still required after P3.2:
- deploy production direct-worker certificates/platform CA trust
- add external KMS/Vault or stronger key-management integration
- add master-key rotation/re-encryption workflow
- consider future worker pull/token resolver flow to avoid resolved credentials in Redis assignment payloads
Do not:
- build full enterprise KMS prematurely
- weaken certificate or token model for convenience
Acceptance:
- production mode cannot create/start resources with plaintext credential metadata
- cross-org, old-attachment, wrong worker/resource/org, and terminal-session denial paths are covered by focused tests/probes
P4. Automated Regression Suite
Goal: convert the painful manual discoveries into repeatable gates.
Do:
- add backend unit/integration tests for org scope, session state, data-plane token, stale worker, clipboard/file policies
- add worker probes for render sequencing, direct baseline, region repair, adapter event routing
- add Windows transport/viewmodel tests for fallback, close/dispose, failed state, frame latest-only, localization resolution
- make smoke scripts emit machine-readable PASS/FAIL reports
- pin each accepted image/build artifact
Acceptance:
- a regression in input, render, worker-death, takeover, clipboard, or upload fails a repeatable test before manual smoke
P5. RDP Performance Next Layer
Goal: improve speed on weak channels after correctness is stable.
Candidate paths:
- RDPGFX on compatible target only
- encoded graphics payloads
- dirty-region compression
- tile/region framing
- adaptive quality profiles
- palette/grayscale/low-bandwidth modes
- per-channel QoS and backpressure telemetry
Do not:
- replace stable region-first path without fallback
- ship a graphics mode that only works on one target
Acceptance:
- direct full-color baseline remains available
- each new graphics mode has compatibility detection and fallback
P6. Product Completion For RDP
Only after P0-P5 gates are stable:
- manual desktop acceptance for server-to-client file download from
RAP_Transfers\ToClient - richer file transfer UX
- final RDP UX polish
- policy management UI
- operational runbooks
- release readiness checklist
P7. Platform Expansion
Only after RDP is stable:
- VNC Adapter
- SSH Adapter
- node-agent runtime/updater
- entry/relay nodes
- mesh routing
- VPN/IP tunnel mode
- Linux/iOS/Android clients
11. Proposed Immediate Next Prompt
Use this as the next implementation prompt if we continue immediately:
Proceed with Stage 5.2 remaining desktop UI proof only - RDP server-to-client
file download.
Goal:
Finish acceptance of safe, policy-aware download from the remote RDP session to
the Windows Access Client UI using the restricted RAP_Transfers\ToClient drop
zone.
Strict rules:
- do not implement arbitrary remote path download
- do not implement remote filesystem browser
- do not implement recursive folder transfer
- do not implement SMB/WebDAV/Windows agent
- do not expose any worker path outside the per-session visible directory
- do not change RDP rendering/input/clipboard behavior
- do not remove backend gateway fallback
- do not implement binary file chunk frames yet
- do not start DP-3B, mesh, VPN, node-agent runtime, or new adapters
Scope:
1. Keep the current Stage 5.2 backend/worker deployment on docker-test.
2. Prove Windows desktop UI download for text and binary files placed in
RAP_Transfers\ToClient.
3. Prove rendering, input, clipboard, upload, lifecycle, and fallback do not
regress.
Acceptance:
- disabled and client_to_server modes block download
- server_to_client and bidirectional modes allow download
- text and binary files download with matching hashes
- traversal/symlink/non-regular/too-large files are blocked
- rendering, input, clipboard, upload, lifecycle, and fallback do not regress
12. Bottom Line
The project direction is sound, but the process must now become stricter:
- design channel semantics first
- implement through adapter boundaries
- prove with live/manual smoke and automated gates
- update source-of-truth docs before starting the next major stage
- reject "temporary" shortcuts unless they have a documented removal condition
The RDP Adapter experience was expensive, but useful. It showed exactly where the architecture must be disciplined before adding SSH, VNC, VPN, mobile clients, or mesh runtime.