# Project Audit And Next-Step Plan Date: 2026-04-26 Status: documentation/audit only. No runtime behavior is changed by this document. ## 1. Executive Summary The project is no longer just an RDP proxy. The correct target is a Secure Access Fabric platform with a control plane, direct realtime data plane, service adapters, tenant isolation, and future node/mesh/VPN capabilities. The implementation has reached a much more advanced state than several operational documents describe. The most important current risk is therefore not only code quality. It is source-of-truth drift: old prompts and READMEs can send the next stage in the wrong direction. The RDP MVP has proven the hard lifecycle assumptions: - real RDP connection through the worker works - active/detach/reattach/takeover/terminate flows are proven - takeover does not recreate the remote session - worker-death/orphan-active-session recovery is proven - Windows client can render and control a real remote desktop - direct worker WSS data plane is implemented and used - binary render frames are implemented on direct data plane - backend gateway JSON/base64 path remains available as fallback/debug - ordered dirty-region delivery is accepted as the current RDP baseline - text clipboard is implemented and accepted - client-to-server file upload to worker-controlled storage is accepted - restricted drive visibility is runtime-proven: uploaded files are visible and openable inside the remote Windows session through `RAP_Transfers` The RDP adapter lesson is clear: "make it simple first and patch later" is dangerous for realtime protocols. Full-frame polling, implicit refresh after input, and backend/Redis realtime relaying worked for proof, but they caused the exact class of latency and correctness issues we later had to unwind. From this point forward, each service adapter must be specified as an event-driven adapter before implementation. Recommended immediate priority: 1. Freeze and document the current working baseline. 2. Synchronize stale project docs with the real state. 3. Preserve the accepted RDP visual correctness/stability baseline. 4. Preserve the accepted Stage 5.1.1 restricted drive visibility behavior. 5. Add automated regression gates so manual discoveries become repeatable tests. ## 2. Audit Method This audit used the current filesystem state in: ```text \\192.168.220.200\mst\codex\rdp-proxy ``` Important environment note: - the directory is not currently a Git checkout (`git status` reports that no `.git` repository exists), so this audit cannot use commit history - the canonical test Docker host is `docker-test` / `192.168.200.61` - the live test stack currently contains `rap_backend_smoke`, `rap_worker_smoke`, `rap_postgres`, and `rap_redis` Commands run during this audit: ```powershell go test ./... dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-graphics-adapter-probe docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-cursor-adapter-probe docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-service-adapter-protocol-probe docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-dataplane-bind-probe --scenario valid ``` Results: - backend tests: PASS - Windows client build: PASS, 0 warnings, 0 errors - worker graphics adapter probe: PASS - worker cursor adapter probe: PASS - worker service adapter protocol probe: PASS - worker data-plane bind valid probe: PASS Coverage warning: - most backend modules still report `[no test files]` - much of the current confidence comes from smoke/manual proofs and logs - this is not enough for production readiness ## 3. Planned Direction The authoritative long-term direction is: - `CODEX_CONTEXT.md` - `docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md` - `docs/architecture/DATA_PLANE_V1.md` - `docs/architecture/SERVICE_ADAPTER_PROTOCOL.md` - `docs/architecture/RDP_ADAPTER_RUNTIME.md` - `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md` The target platform model is: ```text Access Client -> Ingress / Data Plane -> Secure Fabric / Routing -> Service Adapter at egress edge -> Target service ``` For RDP specifically: ```text Access Client <-> platform session/data-plane protocol RDP Adapter <-> FreeRDP / project-owned RDP internals RDP Server ``` This naming should be kept consistent: - Access Client: native Windows/iOS/Android/Linux client that speaks the platform protocol - Control Plane: backend API, auth, orgs, policy, session lifecycle, audit - Data Plane: realtime session traffic channels - Service Adapter: protocol translator for RDP/VNC/SSH/video/etc - RDP Adapter: current C++ RDP service adapter - Entry/Ingress Node: accepts client connections into the fabric - Egress/Service Node: reaches target resources and hosts adapters - Node Agent: native host identity, update, health, and service supervisor ## 4. What Is Implemented ### Backend Implemented: - Go backend foundation - PostgreSQL source-of-truth storage - Redis live coordination/routing - auth foundation - refresh token rotation - devices/trusted devices - org-scoped resources and sessions - platform-core v2 foundation - identity source foundation - node/node-agent control-plane foundation - session broker orchestration - worker coordination and stale worker monitoring - structured localization-ready messages - resource certificate verification policy - clipboard policy - file-transfer policy - data-plane token/candidate generation - backend gateway fallback Key files: - `backend/internal/modules/sessionbroker/service.go` - `backend/internal/modules/sessionbroker/orchestration.go` - `backend/internal/modules/sessionbroker/state_machine.go` - `backend/internal/modules/sessionbroker/dataplane.go` - `backend/internal/modules/sessiongateway/module.go` - `backend/internal/modules/worker/monitor.go` - `backend/internal/modules/resource/module.go` - `backend/internal/modules/auth/service.go` - `backend/internal/platform/httpx/message.go` - `backend/migrations/000005_platform_core_v2.up.sql` - `backend/migrations/000007_clipboard_policy_mode.up.sql` - `backend/migrations/000008_file_transfer_policy_mode.up.sql` Known backend gaps: - automated test coverage is thin outside `sessionbroker` - P3/P3.1 resource secret-readiness and encrypted resolver MVP exists; production mode rejects plaintext credential metadata and requires `secret_ref` for RDP/VNC/SSH resources - external KMS/Vault integration and master-key rotation are not implemented yet - admin/control UI for safe resource/policy management is not the current focus - node-agent runtime is not implemented; only control-plane foundation exists - identity source sync runtime is not implemented ### Windows Client Implemented: - WPF client skeleton and build - auth/login/refresh/logout foundation - organization selection - resource list - active sessions - session window - direct data-plane selection with fallback - binary render receive path - input capture/forwarding - cursor/render display - localization-ready resource layer - text clipboard UI/path - file upload UI/path - failed-session refresh after gateway close Key files: - `clients/windows/src/RemoteAccessPlatform.Windows.App/SessionWindow.xaml` - `clients/windows/src/RemoteAccessPlatform.Windows.Application/ViewModels/SessionWindowViewModel.cs` - `clients/windows/src/RemoteAccessPlatform.Windows.Transport/SessionGatewayClient.cs` - `clients/windows/src/RemoteAccessPlatform.Windows.App/Input/SessionInputMapper.cs` - `clients/windows/src/RemoteAccessPlatform.Windows.Application/Localization/Strings.cs` - `clients/windows/src/RemoteAccessPlatform.Windows.Application/Resources/Strings.resx` Known client gaps: - final UX polish is not complete - automated client regression tests are missing - manual RDP UX remains the acceptance authority for now - some README limitations are stale and understate what exists ### RDP Worker / RDP Adapter Implemented: - standalone C++ worker service - FreeRDP integration behind worker boundary - worker registration/assignment/lease lifecycle - direct worker WSS endpoint - RS256 data-plane token validation - direct bind policy and current attachment validation - JSON control/input/clipboard/file-upload envelopes - binary RAP2 render frames for direct path - backend gateway JSON/base64 fallback - region-first BGRA render path - direct attach baseline full-frame repair - region-loss full-frame repair throttle - cursor adapter boundary - text clipboard through FreeRDP `cliprdr` - client-to-server file upload - restricted visible transfer directory - restricted FreeRDP drive redirection groundwork Key files: - `workers/rdp-worker/src/main.cpp` - `workers/rdp-worker/src/runtime/session_runtime.cpp` - `workers/rdp-worker/include/rdp_worker/runtime/session_runtime.hpp` - `workers/rdp-worker/src/adapter/rdp_adapter_runtime.cpp` - `workers/rdp-worker/src/freerdp/rdp_runtime.cpp` - `workers/rdp-worker/src/dataplane/direct_wss_server.cpp` - `workers/rdp-worker/src/runtime/direct_bind_policy.cpp` - `workers/rdp-worker/include/rdp_worker/adapter/service_adapter_protocol.hpp` Current live/smoke images: ```text rap-backend-smoke:stage5-2-download rap-rdp-worker:stage5-2-download ``` Known worker/RDP gaps: - drag/release repaint is usable but not polished; drag behaves like an older RDP client on a weak link by moving a frame rather than continuously repainting the full window - RDPGFX is gated and disabled by default because the current live target resets the connection when RDPGFX is advertised - encoded graphics/codecs/tiles are not production-accepted yet - file download core data path is runtime-proven through direct worker WSS and backend gateway fallback, and lifecycle blocking is runtime-proven for detach, old-controller takeover, and worker failure. Stage 5.2 is not fully runtime-accepted until Windows desktop UI download is proven - FreeRDP is still the substrate; replacing it is not justified until the adapter boundary proves which pieces are actually insufficient ## 5. Plan vs Fact Matrix | Area | Planned | Current fact | Status | | --- | --- | --- | --- | | Backend foundation | Go, config, HTTP, PostgreSQL, Redis | Implemented and builds | Done | | Auth | access/refresh flow, sessions, devices | Implemented | Done | | Session lifecycle | start/attach/detach/takeover/terminate/fail/recover | Live-proven earlier and preserved | Done, protect | | Multi-tenancy | organizations and org-scoped resources/sessions | Implemented | Done, needs more tests | | Authorization | platform/admin/member boundaries | Implemented foundation | Needs broader tests | | Worker coordination | registration, lease, stale recovery | Implemented and live-proven | Done, protect | | Windows client MVP | native WPF client | Implemented and builds | Done | | Localization messaging | structured backend/client messaging | Implemented and runtime-proven earlier | Done, protect | | Direct data plane | client-to-worker WSS | Implemented | Done | | Binary render | direct binary render, fallback JSON/base64 | Implemented | Done | | RDP adapter event model | event-driven adapter boundary | Implemented and P1 accepted | Done, protect | | RDP render quality | grayscale foundation | Implemented | Partial | | RDPGFX/encoded graphics | future performance path | gated only, not accepted | Not production | | Clipboard | text-only, policy-gated | Accepted | Done | | File upload | client-to-server to worker storage | Accepted | Done | | File visibility in RDP | restricted drive redirection | Accepted via `RAP_Transfers` | Done, protect | | File download | server-to-client | Core and lifecycle runtime-proven, desktop UI proof pending | Prove UI next | | Mesh/VPN/multi-cluster runtime | target architecture only | Not implemented | Correctly deferred | | Node-agent runtime/updater | target/foundation only | Not implemented | Future | | Identity sync runtime | LDAP/OIDC sync | Not implemented | Future | ## 6. Important Source-Of-Truth Drift At the start of this audit these files were stale or partly stale: - `README.md` still points to old compat-era docs and says not to start with UI, while the Windows client already exists - `docs/codex/CURRENT_STATUS.md` says WebSocket takeover proof is still a gap, even though that proof was later closed - `docs/codex/NEXT_STEP_PROMPT.md` previously pointed to platform-core v2 as the next step, although platform-core v2 already exists - `clients/windows/README.md` still says it intentionally stops short of final viewer rendering, but the client now renders the remote desktop - `workers/rdp-worker/README.md` documented recent RDP stages, but previously did not clearly mark the current accepted image and latest manual acceptance - `docs/architecture/DATA_PLANE_V1.md` previously had a stale "Next Implementation Prompt"; it now points to Stage 5.2 live runtime proof - `docs/architecture/RDP_ADAPTER_RUNTIME.md` and `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md` still mark manual UX acceptance as pending before the latest fixes This was the P0 risk addressed by the baseline-freeze documentation pass. Future stages must keep these files current after every accepted runtime change so a future Codex/session cannot follow an old prompt and reintroduce already-rejected architecture. ## 7. Lessons From The RDP Adapter Work The RDP work exposed several project-level rules: 1. Realtime protocol features must be designed as channel semantics first. Input, display, cursor, clipboard, file transfer, and telemetry cannot share one undifferentiated queue. 2. Backend/Redis must not be the production realtime path. It is correct as fallback/debug/control-plane glue, not for high-rate render. 3. Full-frame rendering is not the normal production model. It is needed for baseline, attach, resize, recovery, and fallback repair. 4. Dirty regions cannot be blindly latest-only without a repair strategy. Dropping a region update may leave visible artifacts; the current `region_loss_repair` full-frame repair is a pragmatic safety net. 5. Server-origin events must drive display updates. Remote changes must not depend on local mouse/keyboard events. 6. Input must be independent from render. A key or click must never wait behind a frame, upload chunk, clipboard message, or lease renewal. 7. FreeRDP is not the problem by default. The earlier problem was how we pumped events, scheduled frames, relayed payloads, and treated screen updates. The correct direction is an adapter boundary around FreeRDP first, not a full rewrite before we can prove the replacement. 8. Manual UX proof is essential. Automated input can pass while real user input feels wrong. 9. Every "temporary" shortcut needs an explicit expiration condition. If it does not have one, it becomes architecture. ## 8. What We May Have Missed These are not immediate bugs, but they should be addressed early because they shape the product: - RDP server compatibility matrix: Windows Server versions, NLA modes, GDI vs RDPGFX behavior, color depth, TLS/cert behavior, domain login variants - weak-channel simulation: latency, jitter, loss, constrained bandwidth - high-concurrency session model: many users, many workers, CPU/network limits - deterministic smoke reports: every accepted stage should leave reproducible artifacts and commands - secret management: credentials must move out of plain resource metadata - production PKI: direct worker WSS currently uses smoke-friendly TLS handling on the client side - authorization tests: cross-org denial paths need automated coverage - resource policy test matrix: clipboard/file/cert/session policies - file transfer threat model: filename normalization, symlink escape, overwrite behavior, quotas, cleanup, audit - observability: per-channel latency, frame drops, input latency, worker event pump health, adapter callback counters - client UI state machine tests: close/dispose, failed state, reconnect, takeover, detach, old attachment blocking - upgrade/rollback story: node-agent target exists, runtime is not implemented - deployment topology: container host networking vs Docker bridge/NAT for realtime workloads - service adapter conformance suite: RDP now has a pattern that VNC/SSH/video should follow ## 9. Architectural Decisions To Freeze Now These decisions should be treated as current project rules: 1. PostgreSQL is source of truth. 2. Redis is live coordination/routing only. 3. Backend is control plane, not production render relay. 4. Direct data plane is preferred for realtime RDP traffic. 5. Backend gateway remains fallback/debug until direct path is fully mature. 6. Service adapters translate external protocols to platform channels. 7. RDP Adapter remains C++ and FreeRDP-backed for now. 8. FreeRDP details must not leak into backend or Access Client business logic. 9. Access Client speaks platform protocol, not RDP. 10. Mesh/VPN/multi-cluster/node-agent runtime remain future staged work. 11. RDP must be stabilized before adding VNC/SSH/VPN/product expansion. 12. No new feature should start while source-of-truth docs are stale. ## 10. Recommended Next Stages ### P0. Truth And Baseline Freeze Goal: make the current working system impossible to misunderstand. Do: - update root `README.md` - update `docs/codex/CURRENT_STATUS.md` - update `docs/codex/NEXT_STEP_PROMPT.md` - update `clients/windows/README.md` - update `workers/rdp-worker/README.md` - update `docs/architecture/DATA_PLANE_V1.md` next prompt - update `docs/architecture/RDP_ADAPTER_RUNTIME.md` with latest baseline/region repair status - document current test Docker image/tag and startup commands - preserve the accepted RDP worker baseline - create one "current smoke matrix" document Do not: - add features - start DP-3B - start server-to-client download - start mesh/VPN/node-agent runtime Acceptance: - a new engineer/Codex can read the docs and know the actual next step - no doc points to archived v1 or already-completed stages as next work ### P1. RDP Visual Correctness Hardening Goal: eliminate remaining small artifacts without returning to slow full-frame rendering. Do: - add explicit region sequence/gap diagnostics - prove when artifacts happen: region drop, stale region ordering, missed server callback, client application bug, or repair interval issue - verify client applies region frames to the correct bitmap area and stride - keep baseline full frame on attach - keep full repair only on loss/recovery, not as normal render loop - collect before/after screenshots/logs Do not: - enable RDPGFX globally - add compression/tiles/codecs before correctness is stable - change backend/session lifecycle Acceptance: - remote idle updates repaint without local input - Start menu/task manager/window movement leave no persistent artifacts - input and close behavior remain usable ### P2. Stage 5.1.1 Restricted Drive Visibility Proof Status: accepted as runtime-proven on the test Docker stand. Goal: keep the upload visibility path protected while the RDP Adapter continues to be hardened. Do: - run live smoke with current RDP adapter baseline - upload file from Windows client - verify file appears in `\\tsclient\RAP_Transfers` - open text and binary files inside the remote Windows session - prove disabled policy blocks upload - prove takeover/detach/failure block old or invalid upload - verify directory cleanup on terminate Do not: - implement download - expose arbitrary worker filesystem - implement shared folders or SMB/WebDAV Accepted proof: - uploaded file is visible and openable inside remote Windows - only per-session visible directory is exposed - worker logs show `RAP_Transfers` configured as the only redirected drive - termination cleans the per-session transfer directory ### P3. Security And Secrets Readiness Status: P3.1 MVP complete; production TLS/PKI remains P3.2. Goal: remove proof-stage security shortcuts before broad usage. Completed: - documented secret-reference model in `docs/architecture/SECURITY_SECRETS_READINESS.md` - production mode rejects plaintext credential-like resource metadata - production RDP/VNC/SSH resources require `secret_ref` - session start rejects compat plaintext resources in production mode - data-plane allowed-channel policy test exists - worker direct-bind denial probes cover wrong worker/user/org/resource, wrong attachment, over-broad channels, and failed/terminated states - encrypted PostgreSQL-backed `resource_secrets` store exists - resource secret create/rotate endpoint updates `resources.secret_ref` without returning plaintext - session assignment resolves `secret_ref` after organization/resource/session/ worker/lease checks and does not mutate `remote_sessions.metadata` with plaintext - secret access/access-denied/rotation audit events exist - direct worker WSS TLS trust metadata/guard exists; production backend omits smoke-only direct candidates and production Windows client skips untrusted direct candidates Still required after P3.2: - deploy production direct-worker certificates/platform CA trust - add external KMS/Vault or stronger key-management integration - add master-key rotation/re-encryption workflow - consider future worker pull/token resolver flow to avoid resolved credentials in Redis assignment payloads Do not: - build full enterprise KMS prematurely - weaken certificate or token model for convenience Acceptance: - production mode cannot create/start resources with plaintext credential metadata - cross-org, old-attachment, wrong worker/resource/org, and terminal-session denial paths are covered by focused tests/probes ### P4. Automated Regression Suite Goal: convert the painful manual discoveries into repeatable gates. Do: - add backend unit/integration tests for org scope, session state, data-plane token, stale worker, clipboard/file policies - add worker probes for render sequencing, direct baseline, region repair, adapter event routing - add Windows transport/viewmodel tests for fallback, close/dispose, failed state, frame latest-only, localization resolution - make smoke scripts emit machine-readable PASS/FAIL reports - pin each accepted image/build artifact Acceptance: - a regression in input, render, worker-death, takeover, clipboard, or upload fails a repeatable test before manual smoke ### P5. RDP Performance Next Layer Goal: improve speed on weak channels after correctness is stable. Candidate paths: - RDPGFX on compatible target only - encoded graphics payloads - dirty-region compression - tile/region framing - adaptive quality profiles - palette/grayscale/low-bandwidth modes - per-channel QoS and backpressure telemetry Do not: - replace stable region-first path without fallback - ship a graphics mode that only works on one target Acceptance: - direct full-color baseline remains available - each new graphics mode has compatibility detection and fallback ### P6. Product Completion For RDP Only after P0-P5 gates are stable: - manual desktop acceptance for server-to-client file download from `RAP_Transfers\ToClient` - richer file transfer UX - final RDP UX polish - policy management UI - operational runbooks - release readiness checklist ### P7. Platform Expansion Only after RDP is stable: - VNC Adapter - SSH Adapter - node-agent runtime/updater - entry/relay nodes - mesh routing - VPN/IP tunnel mode - Linux/iOS/Android clients ## 11. Proposed Immediate Next Prompt Use this as the next implementation prompt if we continue immediately: ```text Proceed with Stage 5.2 remaining desktop UI proof only - RDP server-to-client file download. Goal: Finish acceptance of safe, policy-aware download from the remote RDP session to the Windows Access Client UI using the restricted RAP_Transfers\ToClient drop zone. Strict rules: - do not implement arbitrary remote path download - do not implement remote filesystem browser - do not implement recursive folder transfer - do not implement SMB/WebDAV/Windows agent - do not expose any worker path outside the per-session visible directory - do not change RDP rendering/input/clipboard behavior - do not remove backend gateway fallback - do not implement binary file chunk frames yet - do not start DP-3B, mesh, VPN, node-agent runtime, or new adapters Scope: 1. Keep the current Stage 5.2 backend/worker deployment on docker-test. 2. Prove Windows desktop UI download for text and binary files placed in RAP_Transfers\ToClient. 3. Prove rendering, input, clipboard, upload, lifecycle, and fallback do not regress. Acceptance: - disabled and client_to_server modes block download - server_to_client and bidirectional modes allow download - text and binary files download with matching hashes - traversal/symlink/non-regular/too-large files are blocked - rendering, input, clipboard, upload, lifecycle, and fallback do not regress ``` ## 12. Bottom Line The project direction is sound, but the process must now become stricter: - design channel semantics first - implement through adapter boundaries - prove with live/manual smoke and automated gates - update source-of-truth docs before starting the next major stage - reject "temporary" shortcuts unless they have a documented removal condition The RDP Adapter experience was expensive, but useful. It showed exactly where the architecture must be disciplined before adding SSH, VNC, VPN, mobile clients, or mesh runtime.