Initial project snapshot

This commit is contained in:
2026-04-28 22:29:50 +03:00
commit 8ba0561f4f
365 changed files with 91832 additions and 0 deletions
+221
View File
@@ -0,0 +1,221 @@
# Current Baseline Matrix
Date: 2026-04-26
Purpose: single operational snapshot of the current project baseline. This file
is not a target architecture document. It describes what is currently proven,
what is merely implemented, and what remains unproven.
## Environment
Canonical test environment:
```text
Docker host: 192.168.200.61
SSH alias: docker-test
Docker endpoint: ssh://docker-test
Docker context: test-ubuntu
Backend API: http://192.168.200.61:8080/api/v1
Backend gateway: ws://192.168.200.61:8080/api/v1/gateway/ws
```
Current live/smoke containers:
| Container | Image | Role |
| --- | --- | --- |
| `rap_backend_smoke` | `rap-backend-smoke:stage5-2-download` | backend control plane |
| `rap_worker_smoke` | `rap-rdp-worker:stage5-2-download` | accepted RDP Adapter worker baseline plus runtime-proven Stage 5.2 core download path |
| `rap_postgres` | `postgres:16` | source-of-truth database |
| `rap_redis` | `redis:7` | live coordination/routing |
Current Windows client endpoints:
```json
{
"api_base_url": "http://192.168.200.61:8080/api/v1",
"gateway_websocket_url": "ws://192.168.200.61:8080/api/v1/gateway/ws",
"prefer_direct_data_plane": true,
"direct_data_plane_connect_timeout_ms": 2500,
"direct_data_plane_color_mode": "full_color",
"direct_data_plane_platform_ca_bundle": "artifacts/p3-5-platform-ca.crt",
"environment": "production",
"allow_insecure_direct_data_plane_tls_for_smoke": false
}
```
## Build And Probe Snapshot
Commands run during P0:
```powershell
go test ./...
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-graphics-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-cursor-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-service-adapter-protocol-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-dataplane-bind-probe --scenario valid
```
Additional accepted P1 baseline checks:
```powershell
go test ./...
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
docker -H ssh://docker-test build --tag rap-rdp-worker:rdp-p1-region-order2 --file workers/rdp-worker/Dockerfile workers/rdp-worker
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-graphics-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-cursor-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-service-adapter-protocol-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-dataplane-bind-probe --scenario valid
```
Results:
| Check | Result | Notes |
| --- | --- | --- |
| Backend `go test ./...` | PASS | Most packages still have no test files |
| Windows solution build | PASS | 0 warnings, 0 errors |
| Worker graphics adapter probe | PASS | `graphics_adapter_probe ok` |
| Worker cursor adapter probe | PASS | `cursor_adapter_probe ok` |
| Worker service adapter protocol probe | PASS | channel model prints successfully |
| Worker direct bind valid probe | PASS | `PASS scenario=valid` |
| P1 worker image build | PASS | `rap-rdp-worker:rdp-p1-region-order2` |
| P1 worker probes | PASS | graphics, cursor, protocol, direct bind |
| P1 smoke-worker deployment | PASS | `rap_worker_smoke` online on test Docker |
| P3 backend secret guard tests | PASS | production plaintext metadata rejected; dev/smoke allowed |
| P3 data-plane policy test | PASS | allowed channels follow clipboard/file-transfer policy |
| P3 worker bind denial probes | PASS | wrong worker/user/org/resource/attachment/channels/state rejected |
| P3.3 production secret smoke | PASS | secret-backed RDP resource starts real session on test stand |
| P3.3 production fallback smoke | PASS | production backend omits smoke-only direct WSS candidate |
| P3.3 dev/smoke direct candidate | PASS | direct candidate is `smoke_only=true`, not production trusted |
| P3.4 production WSS trust design | PASS | platform CA, certificate lifecycle, app-local trust, smoke plan documented |
| P3.5 app-local platform CA smoke | PASS | direct worker WSS selected without insecure TLS bypass; unknown CA and smoke-only production fallback proved |
| P3.6 stale worker event idempotency | PASS | backend restart survives stale Redis worker events; terminal PostgreSQL sessions stay terminal |
| Stage 5.2 file download build | PASS | backend/worker/client build |
| Stage 5.2 core download runtime | PASS | direct worker WSS and backend gateway text/binary size/hash; policy block for disabled/client_to_server |
| Stage 5.2 download lifecycle blocking | PASS | detach blocks, old-controller takeover returns `session.taken_over`, worker failure marks session `failed` and closes direct WS |
Important limitation:
- this snapshot does not replace a live manual RDP smoke pass
- the repository directory used for this audit is not currently a Git checkout,
so commit-level provenance is unavailable here
## Feature Matrix
| Area | Status | Current proof level | Next action |
| --- | --- | --- | --- |
| Backend foundation | Implemented | build/test PASS | expand automated tests |
| Auth/refresh/devices | Implemented | previous runtime proof | add regression tests |
| Organization scope | Implemented | previous hardening pass | add cross-org tests |
| Session lifecycle | Implemented | live-proven | protect from regression |
| Worker registration/leases | Implemented | live-proven | protect from regression |
| Worker-death recovery | Implemented | live-proven | add automated smoke |
| Structured messaging/localization | Implemented | runtime-proven | protect from regression |
| Direct worker WSS | Implemented | live-proven | preserve |
| Backend gateway fallback | Implemented | smoke-proven | preserve |
| Binary direct render | Implemented | smoke-proven | preserve |
| RDP region-first render | Implemented | live/manual usable | harden artifacts |
| Direct attach baseline | Implemented | current baseline | preserve |
| Region-loss repair | Implemented | current baseline | diagnose remaining artifacts |
| Ordered region delivery | Implemented | manual visual smoke accepted | protect |
| RDPGFX | Gated only | default path smoke-proven | keep disabled |
| Keyboard/mouse input | Implemented | manually usable | protect |
| Cursor updates | Implemented | probe/smoke-proven | protect |
| Text clipboard | Implemented | accepted | protect |
| File upload | Implemented | accepted to worker storage | protect |
| Restricted drive visibility | Implemented | runtime-proven via `RAP_Transfers` | protect |
| File download | Implemented | core data path and lifecycle blocking runtime-proven; desktop UI proof pending | prove remaining UI next |
| Resource secret readiness | Guard implemented | backend tests PASS | protect |
| Encrypted secret resolver | MVP implemented | live smoke PASS on test stand | harden KMS/rotation later |
| Direct worker WSS TLS/PKI guard | Guard implemented | production platform CA smoke PASS | preserve |
| Stale worker event restart safety | Implemented | runtime smoke PASS | protect |
| Node-agent runtime | Not implemented | control-plane foundation only | future |
| Mesh/VPN/runtime | Not implemented | target architecture only | future |
| SSH/VNC adapters | Not implemented | none | future after RDP |
## RDP Baseline
Current accepted RDP worker image:
```text
rap-rdp-worker:rdp-p1-region-order2
```
Previous accepted baseline image:
```text
rap-rdp-worker:rdp-region-repair
```
Current RDP render model:
- classic FreeRDP/GDI region-first BGRA path
- direct worker WSS binary `RAP2` frames
- backend gateway JSON/base64 fallback
- full frame on connect/attach/baseline/recovery/fallback repair
- dirty region updates as normal display path
- cursor as independent latest-only channel
- input highest priority
- clipboard and file upload reliable/policy-gated
Current RDP known limitation:
- window drag uses old-client/slow-link style frame-only movement; repaint after
releasing a moved window is usable but not yet polished
Current accepted P1 behavior:
- dirty-region updates are preserved in-order through `SessionRuntime`, worker
direct WSS, Windows transport, and WPF presenter queues
- full frames still supersede pending region queues
- worker direct region queue overflow requests throttled full-frame repair
- client logs region sequence gaps and regions received before a baseline
- manual visual smoke accepted idle repaint, Start menu/hover, drag usability,
keyboard, mouse, and session close
Current RDP non-goals:
- no DP-3B adaptive quality yet
- no compression/codecs/tiles yet
- no RDPGFX default enable
- no full Stage 5.2 desktop UI acceptance yet
- no UI redesign
- no backend/session lifecycle rewrite
## Documentation Truth Status
Updated during P0:
- `README.md`
- `README_START_HERE.md`
- `docs/codex/CURRENT_STATUS.md`
- `docs/codex/NEXT_STEP_PROMPT.md`
- `clients/windows/README.md`
- `workers/rdp-worker/README.md`
- `docs/architecture/DATA_PLANE_V1.md`
- `docs/architecture/RDP_ADAPTER_RUNTIME.md`
- `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md`
- `docs/architecture/RDP_FILE_DOWNLOAD_STAGE_5_2.md`
- `docs/audits/CURRENT_BASELINE_MATRIX.md`
Current authoritative audit:
- `docs/audits/PROJECT_AUDIT_2026-04-26.md`
Legacy warning:
- `docs/_legacy_v1` is historical reference only and must not be used for
implementation decisions
## Correct Next Step
Proceed with Stage 5.2 remaining live runtime proof - Server-to-Client File
Download:
- keep `rap-backend-smoke:stage5-2-download` and
`rap-rdp-worker:stage5-2-download` deployed on `docker-test`
- prove Windows desktop UI download for files placed in `RAP_Transfers\ToClient`
- prove rendering/input/clipboard/upload/reconnect/takeover regressions
- keep backend gateway fallback active
- do not start arbitrary remote path download, SMB/WebDAV, Windows agent,
binary file chunk frames, DP-3B, mesh/VPN, node-agent runtime, or new adapters
+662
View File
@@ -0,0 +1,662 @@
# Project Audit And Next-Step Plan
Date: 2026-04-26
Status: documentation/audit only. No runtime behavior is changed by this
document.
## 1. Executive Summary
The project is no longer just an RDP proxy. The correct target is a Secure
Access Fabric platform with a control plane, direct realtime data plane,
service adapters, tenant isolation, and future node/mesh/VPN capabilities.
The implementation has reached a much more advanced state than several
operational documents describe. The most important current risk is therefore
not only code quality. It is source-of-truth drift: old prompts and READMEs can
send the next stage in the wrong direction.
The RDP MVP has proven the hard lifecycle assumptions:
- real RDP connection through the worker works
- active/detach/reattach/takeover/terminate flows are proven
- takeover does not recreate the remote session
- worker-death/orphan-active-session recovery is proven
- Windows client can render and control a real remote desktop
- direct worker WSS data plane is implemented and used
- binary render frames are implemented on direct data plane
- backend gateway JSON/base64 path remains available as fallback/debug
- ordered dirty-region delivery is accepted as the current RDP baseline
- text clipboard is implemented and accepted
- client-to-server file upload to worker-controlled storage is accepted
- restricted drive visibility is runtime-proven: uploaded files are visible and
openable inside the remote Windows session through `RAP_Transfers`
The RDP adapter lesson is clear: "make it simple first and patch later" is
dangerous for realtime protocols. Full-frame polling, implicit refresh after
input, and backend/Redis realtime relaying worked for proof, but they caused
the exact class of latency and correctness issues we later had to unwind. From
this point forward, each service adapter must be specified as an event-driven
adapter before implementation.
Recommended immediate priority:
1. Freeze and document the current working baseline.
2. Synchronize stale project docs with the real state.
3. Preserve the accepted RDP visual correctness/stability baseline.
4. Preserve the accepted Stage 5.1.1 restricted drive visibility behavior.
5. Add automated regression gates so manual discoveries become repeatable tests.
## 2. Audit Method
This audit used the current filesystem state in:
```text
\\192.168.220.200\mst\codex\rdp-proxy
```
Important environment note:
- the directory is not currently a Git checkout (`git status` reports that no
`.git` repository exists), so this audit cannot use commit history
- the canonical test Docker host is `docker-test` / `192.168.200.61`
- the live test stack currently contains `rap_backend_smoke`, `rap_worker_smoke`,
`rap_postgres`, and `rap_redis`
Commands run during this audit:
```powershell
go test ./...
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-graphics-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-cursor-adapter-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-service-adapter-protocol-probe
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-dataplane-bind-probe --scenario valid
```
Results:
- backend tests: PASS
- Windows client build: PASS, 0 warnings, 0 errors
- worker graphics adapter probe: PASS
- worker cursor adapter probe: PASS
- worker service adapter protocol probe: PASS
- worker data-plane bind valid probe: PASS
Coverage warning:
- most backend modules still report `[no test files]`
- much of the current confidence comes from smoke/manual proofs and logs
- this is not enough for production readiness
## 3. Planned Direction
The authoritative long-term direction is:
- `CODEX_CONTEXT.md`
- `docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md`
- `docs/architecture/DATA_PLANE_V1.md`
- `docs/architecture/SERVICE_ADAPTER_PROTOCOL.md`
- `docs/architecture/RDP_ADAPTER_RUNTIME.md`
- `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md`
The target platform model is:
```text
Access Client
-> Ingress / Data Plane
-> Secure Fabric / Routing
-> Service Adapter at egress edge
-> Target service
```
For RDP specifically:
```text
Access Client
<-> platform session/data-plane protocol
RDP Adapter
<-> FreeRDP / project-owned RDP internals
RDP Server
```
This naming should be kept consistent:
- Access Client: native Windows/iOS/Android/Linux client that speaks the
platform protocol
- Control Plane: backend API, auth, orgs, policy, session lifecycle, audit
- Data Plane: realtime session traffic channels
- Service Adapter: protocol translator for RDP/VNC/SSH/video/etc
- RDP Adapter: current C++ RDP service adapter
- Entry/Ingress Node: accepts client connections into the fabric
- Egress/Service Node: reaches target resources and hosts adapters
- Node Agent: native host identity, update, health, and service supervisor
## 4. What Is Implemented
### Backend
Implemented:
- Go backend foundation
- PostgreSQL source-of-truth storage
- Redis live coordination/routing
- auth foundation
- refresh token rotation
- devices/trusted devices
- org-scoped resources and sessions
- platform-core v2 foundation
- identity source foundation
- node/node-agent control-plane foundation
- session broker orchestration
- worker coordination and stale worker monitoring
- structured localization-ready messages
- resource certificate verification policy
- clipboard policy
- file-transfer policy
- data-plane token/candidate generation
- backend gateway fallback
Key files:
- `backend/internal/modules/sessionbroker/service.go`
- `backend/internal/modules/sessionbroker/orchestration.go`
- `backend/internal/modules/sessionbroker/state_machine.go`
- `backend/internal/modules/sessionbroker/dataplane.go`
- `backend/internal/modules/sessiongateway/module.go`
- `backend/internal/modules/worker/monitor.go`
- `backend/internal/modules/resource/module.go`
- `backend/internal/modules/auth/service.go`
- `backend/internal/platform/httpx/message.go`
- `backend/migrations/000005_platform_core_v2.up.sql`
- `backend/migrations/000007_clipboard_policy_mode.up.sql`
- `backend/migrations/000008_file_transfer_policy_mode.up.sql`
Known backend gaps:
- automated test coverage is thin outside `sessionbroker`
- P3/P3.1 resource secret-readiness and encrypted resolver MVP exists;
production mode rejects plaintext credential metadata and requires
`secret_ref` for RDP/VNC/SSH resources
- external KMS/Vault integration and master-key rotation are not implemented
yet
- admin/control UI for safe resource/policy management is not the current focus
- node-agent runtime is not implemented; only control-plane foundation exists
- identity source sync runtime is not implemented
### Windows Client
Implemented:
- WPF client skeleton and build
- auth/login/refresh/logout foundation
- organization selection
- resource list
- active sessions
- session window
- direct data-plane selection with fallback
- binary render receive path
- input capture/forwarding
- cursor/render display
- localization-ready resource layer
- text clipboard UI/path
- file upload UI/path
- failed-session refresh after gateway close
Key files:
- `clients/windows/src/RemoteAccessPlatform.Windows.App/SessionWindow.xaml`
- `clients/windows/src/RemoteAccessPlatform.Windows.Application/ViewModels/SessionWindowViewModel.cs`
- `clients/windows/src/RemoteAccessPlatform.Windows.Transport/SessionGatewayClient.cs`
- `clients/windows/src/RemoteAccessPlatform.Windows.App/Input/SessionInputMapper.cs`
- `clients/windows/src/RemoteAccessPlatform.Windows.Application/Localization/Strings.cs`
- `clients/windows/src/RemoteAccessPlatform.Windows.Application/Resources/Strings.resx`
Known client gaps:
- final UX polish is not complete
- automated client regression tests are missing
- manual RDP UX remains the acceptance authority for now
- some README limitations are stale and understate what exists
### RDP Worker / RDP Adapter
Implemented:
- standalone C++ worker service
- FreeRDP integration behind worker boundary
- worker registration/assignment/lease lifecycle
- direct worker WSS endpoint
- RS256 data-plane token validation
- direct bind policy and current attachment validation
- JSON control/input/clipboard/file-upload envelopes
- binary RAP2 render frames for direct path
- backend gateway JSON/base64 fallback
- region-first BGRA render path
- direct attach baseline full-frame repair
- region-loss full-frame repair throttle
- cursor adapter boundary
- text clipboard through FreeRDP `cliprdr`
- client-to-server file upload
- restricted visible transfer directory
- restricted FreeRDP drive redirection groundwork
Key files:
- `workers/rdp-worker/src/main.cpp`
- `workers/rdp-worker/src/runtime/session_runtime.cpp`
- `workers/rdp-worker/include/rdp_worker/runtime/session_runtime.hpp`
- `workers/rdp-worker/src/adapter/rdp_adapter_runtime.cpp`
- `workers/rdp-worker/src/freerdp/rdp_runtime.cpp`
- `workers/rdp-worker/src/dataplane/direct_wss_server.cpp`
- `workers/rdp-worker/src/runtime/direct_bind_policy.cpp`
- `workers/rdp-worker/include/rdp_worker/adapter/service_adapter_protocol.hpp`
Current live/smoke images:
```text
rap-backend-smoke:stage5-2-download
rap-rdp-worker:stage5-2-download
```
Known worker/RDP gaps:
- drag/release repaint is usable but not polished; drag behaves like an older
RDP client on a weak link by moving a frame rather than continuously
repainting the full window
- RDPGFX is gated and disabled by default because the current live target resets
the connection when RDPGFX is advertised
- encoded graphics/codecs/tiles are not production-accepted yet
- file download core data path is runtime-proven through direct worker WSS and
backend gateway fallback, and lifecycle blocking is runtime-proven for
detach, old-controller takeover, and worker failure. Stage 5.2 is not fully
runtime-accepted until Windows desktop UI download is proven
- FreeRDP is still the substrate; replacing it is not justified until the
adapter boundary proves which pieces are actually insufficient
## 5. Plan vs Fact Matrix
| Area | Planned | Current fact | Status |
| --- | --- | --- | --- |
| Backend foundation | Go, config, HTTP, PostgreSQL, Redis | Implemented and builds | Done |
| Auth | access/refresh flow, sessions, devices | Implemented | Done |
| Session lifecycle | start/attach/detach/takeover/terminate/fail/recover | Live-proven earlier and preserved | Done, protect |
| Multi-tenancy | organizations and org-scoped resources/sessions | Implemented | Done, needs more tests |
| Authorization | platform/admin/member boundaries | Implemented foundation | Needs broader tests |
| Worker coordination | registration, lease, stale recovery | Implemented and live-proven | Done, protect |
| Windows client MVP | native WPF client | Implemented and builds | Done |
| Localization messaging | structured backend/client messaging | Implemented and runtime-proven earlier | Done, protect |
| Direct data plane | client-to-worker WSS | Implemented | Done |
| Binary render | direct binary render, fallback JSON/base64 | Implemented | Done |
| RDP adapter event model | event-driven adapter boundary | Implemented and P1 accepted | Done, protect |
| RDP render quality | grayscale foundation | Implemented | Partial |
| RDPGFX/encoded graphics | future performance path | gated only, not accepted | Not production |
| Clipboard | text-only, policy-gated | Accepted | Done |
| File upload | client-to-server to worker storage | Accepted | Done |
| File visibility in RDP | restricted drive redirection | Accepted via `RAP_Transfers` | Done, protect |
| File download | server-to-client | Core and lifecycle runtime-proven, desktop UI proof pending | Prove UI next |
| Mesh/VPN/multi-cluster runtime | target architecture only | Not implemented | Correctly deferred |
| Node-agent runtime/updater | target/foundation only | Not implemented | Future |
| Identity sync runtime | LDAP/OIDC sync | Not implemented | Future |
## 6. Important Source-Of-Truth Drift
At the start of this audit these files were stale or partly stale:
- `README.md` still points to old legacy docs and says not to start with UI,
while the Windows client already exists
- `docs/codex/CURRENT_STATUS.md` says WebSocket takeover proof is still a gap,
even though that proof was later closed
- `docs/codex/NEXT_STEP_PROMPT.md` previously pointed to platform-core v2 as
the next step, although platform-core v2 already exists
- `clients/windows/README.md` still says it intentionally stops short of final
viewer rendering, but the client now renders the remote desktop
- `workers/rdp-worker/README.md` documented recent RDP stages, but previously
did not clearly mark the current accepted image and latest manual acceptance
- `docs/architecture/DATA_PLANE_V1.md` previously had a stale "Next
Implementation Prompt"; it now points to Stage 5.2 live runtime proof
- `docs/architecture/RDP_ADAPTER_RUNTIME.md` and
`docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md` still mark manual UX
acceptance as pending before the latest fixes
This was the P0 risk addressed by the baseline-freeze documentation pass. Future
stages must keep these files current after every accepted runtime change so a
future Codex/session cannot follow an old prompt and reintroduce
already-rejected architecture.
## 7. Lessons From The RDP Adapter Work
The RDP work exposed several project-level rules:
1. Realtime protocol features must be designed as channel semantics first.
Input, display, cursor, clipboard, file transfer, and telemetry cannot share
one undifferentiated queue.
2. Backend/Redis must not be the production realtime path. It is correct as
fallback/debug/control-plane glue, not for high-rate render.
3. Full-frame rendering is not the normal production model. It is needed for
baseline, attach, resize, recovery, and fallback repair.
4. Dirty regions cannot be blindly latest-only without a repair strategy.
Dropping a region update may leave visible artifacts; the current
`region_loss_repair` full-frame repair is a pragmatic safety net.
5. Server-origin events must drive display updates. Remote changes must not
depend on local mouse/keyboard events.
6. Input must be independent from render. A key or click must never wait behind
a frame, upload chunk, clipboard message, or lease renewal.
7. FreeRDP is not the problem by default. The earlier problem was how we pumped
events, scheduled frames, relayed payloads, and treated screen updates. The
correct direction is an adapter boundary around FreeRDP first, not a full
rewrite before we can prove the replacement.
8. Manual UX proof is essential. Automated input can pass while real user input
feels wrong.
9. Every "temporary" shortcut needs an explicit expiration condition. If it does
not have one, it becomes architecture.
## 8. What We May Have Missed
These are not immediate bugs, but they should be addressed early because they
shape the product:
- RDP server compatibility matrix: Windows Server versions, NLA modes, GDI vs
RDPGFX behavior, color depth, TLS/cert behavior, domain login variants
- weak-channel simulation: latency, jitter, loss, constrained bandwidth
- high-concurrency session model: many users, many workers, CPU/network limits
- deterministic smoke reports: every accepted stage should leave reproducible
artifacts and commands
- secret management: credentials must move out of plain resource metadata
- production PKI: direct worker WSS currently uses smoke-friendly TLS handling
on the client side
- authorization tests: cross-org denial paths need automated coverage
- resource policy test matrix: clipboard/file/cert/session policies
- file transfer threat model: filename normalization, symlink escape, overwrite
behavior, quotas, cleanup, audit
- observability: per-channel latency, frame drops, input latency, worker event
pump health, adapter callback counters
- client UI state machine tests: close/dispose, failed state, reconnect,
takeover, detach, old attachment blocking
- upgrade/rollback story: node-agent target exists, runtime is not implemented
- deployment topology: container host networking vs Docker bridge/NAT for
realtime workloads
- service adapter conformance suite: RDP now has a pattern that VNC/SSH/video
should follow
## 9. Architectural Decisions To Freeze Now
These decisions should be treated as current project rules:
1. PostgreSQL is source of truth.
2. Redis is live coordination/routing only.
3. Backend is control plane, not production render relay.
4. Direct data plane is preferred for realtime RDP traffic.
5. Backend gateway remains fallback/debug until direct path is fully mature.
6. Service adapters translate external protocols to platform channels.
7. RDP Adapter remains C++ and FreeRDP-backed for now.
8. FreeRDP details must not leak into backend or Access Client business logic.
9. Access Client speaks platform protocol, not RDP.
10. Mesh/VPN/multi-cluster/node-agent runtime remain future staged work.
11. RDP must be stabilized before adding VNC/SSH/VPN/product expansion.
12. No new feature should start while source-of-truth docs are stale.
## 10. Recommended Next Stages
### P0. Truth And Baseline Freeze
Goal: make the current working system impossible to misunderstand.
Do:
- update root `README.md`
- update `docs/codex/CURRENT_STATUS.md`
- update `docs/codex/NEXT_STEP_PROMPT.md`
- update `clients/windows/README.md`
- update `workers/rdp-worker/README.md`
- update `docs/architecture/DATA_PLANE_V1.md` next prompt
- update `docs/architecture/RDP_ADAPTER_RUNTIME.md` with latest baseline/region
repair status
- document current test Docker image/tag and startup commands
- preserve the accepted RDP worker baseline
- create one "current smoke matrix" document
Do not:
- add features
- start DP-3B
- start server-to-client download
- start mesh/VPN/node-agent runtime
Acceptance:
- a new engineer/Codex can read the docs and know the actual next step
- no doc points to legacy v1 or already-completed stages as next work
### P1. RDP Visual Correctness Hardening
Goal: eliminate remaining small artifacts without returning to slow full-frame
rendering.
Do:
- add explicit region sequence/gap diagnostics
- prove when artifacts happen: region drop, stale region ordering, missed server
callback, client application bug, or repair interval issue
- verify client applies region frames to the correct bitmap area and stride
- keep baseline full frame on attach
- keep full repair only on loss/recovery, not as normal render loop
- collect before/after screenshots/logs
Do not:
- enable RDPGFX globally
- add compression/tiles/codecs before correctness is stable
- change backend/session lifecycle
Acceptance:
- remote idle updates repaint without local input
- Start menu/task manager/window movement leave no persistent artifacts
- input and close behavior remain usable
### P2. Stage 5.1.1 Restricted Drive Visibility Proof
Status: accepted as runtime-proven on the test Docker stand.
Goal: keep the upload visibility path protected while the RDP Adapter continues
to be hardened.
Do:
- run live smoke with current RDP adapter baseline
- upload file from Windows client
- verify file appears in `\\tsclient\RAP_Transfers`
- open text and binary files inside the remote Windows session
- prove disabled policy blocks upload
- prove takeover/detach/failure block old or invalid upload
- verify directory cleanup on terminate
Do not:
- implement download
- expose arbitrary worker filesystem
- implement shared folders or SMB/WebDAV
Accepted proof:
- uploaded file is visible and openable inside remote Windows
- only per-session visible directory is exposed
- worker logs show `RAP_Transfers` configured as the only redirected drive
- termination cleans the per-session transfer directory
### P3. Security And Secrets Readiness
Status: P3.1 MVP complete; production TLS/PKI remains P3.2.
Goal: remove proof-stage security shortcuts before broad usage.
Completed:
- documented secret-reference model in
`docs/architecture/SECURITY_SECRETS_READINESS.md`
- production mode rejects plaintext credential-like resource metadata
- production RDP/VNC/SSH resources require `secret_ref`
- session start rejects legacy plaintext resources in production mode
- data-plane allowed-channel policy test exists
- worker direct-bind denial probes cover wrong worker/user/org/resource,
wrong attachment, over-broad channels, and failed/terminated states
- encrypted PostgreSQL-backed `resource_secrets` store exists
- resource secret create/rotate endpoint updates `resources.secret_ref` without
returning plaintext
- session assignment resolves `secret_ref` after organization/resource/session/
worker/lease checks and does not mutate `remote_sessions.metadata` with
plaintext
- secret access/access-denied/rotation audit events exist
- direct worker WSS TLS trust metadata/guard exists; production backend omits
smoke-only direct candidates and production Windows client skips untrusted
direct candidates
Still required after P3.2:
- deploy production direct-worker certificates/platform CA trust
- add external KMS/Vault or stronger key-management integration
- add master-key rotation/re-encryption workflow
- consider future worker pull/token resolver flow to avoid resolved credentials
in Redis assignment payloads
Do not:
- build full enterprise KMS prematurely
- weaken certificate or token model for convenience
Acceptance:
- production mode cannot create/start resources with plaintext credential
metadata
- cross-org, old-attachment, wrong worker/resource/org, and terminal-session
denial paths are covered by focused tests/probes
### P4. Automated Regression Suite
Goal: convert the painful manual discoveries into repeatable gates.
Do:
- add backend unit/integration tests for org scope, session state, data-plane
token, stale worker, clipboard/file policies
- add worker probes for render sequencing, direct baseline, region repair,
adapter event routing
- add Windows transport/viewmodel tests for fallback, close/dispose, failed
state, frame latest-only, localization resolution
- make smoke scripts emit machine-readable PASS/FAIL reports
- pin each accepted image/build artifact
Acceptance:
- a regression in input, render, worker-death, takeover, clipboard, or upload
fails a repeatable test before manual smoke
### P5. RDP Performance Next Layer
Goal: improve speed on weak channels after correctness is stable.
Candidate paths:
- RDPGFX on compatible target only
- encoded graphics payloads
- dirty-region compression
- tile/region framing
- adaptive quality profiles
- palette/grayscale/low-bandwidth modes
- per-channel QoS and backpressure telemetry
Do not:
- replace stable region-first path without fallback
- ship a graphics mode that only works on one target
Acceptance:
- direct full-color baseline remains available
- each new graphics mode has compatibility detection and fallback
### P6. Product Completion For RDP
Only after P0-P5 gates are stable:
- manual desktop acceptance for server-to-client file download from
`RAP_Transfers\ToClient`
- richer file transfer UX
- final RDP UX polish
- policy management UI
- operational runbooks
- release readiness checklist
### P7. Platform Expansion
Only after RDP is stable:
- VNC Adapter
- SSH Adapter
- node-agent runtime/updater
- entry/relay nodes
- mesh routing
- VPN/IP tunnel mode
- Linux/iOS/Android clients
## 11. Proposed Immediate Next Prompt
Use this as the next implementation prompt if we continue immediately:
```text
Proceed with Stage 5.2 remaining desktop UI proof only - RDP server-to-client
file download.
Goal:
Finish acceptance of safe, policy-aware download from the remote RDP session to
the Windows Access Client UI using the restricted RAP_Transfers\ToClient drop
zone.
Strict rules:
- do not implement arbitrary remote path download
- do not implement remote filesystem browser
- do not implement recursive folder transfer
- do not implement SMB/WebDAV/Windows agent
- do not expose any worker path outside the per-session visible directory
- do not change RDP rendering/input/clipboard behavior
- do not remove backend gateway fallback
- do not implement binary file chunk frames yet
- do not start DP-3B, mesh, VPN, node-agent runtime, or new adapters
Scope:
1. Keep the current Stage 5.2 backend/worker deployment on docker-test.
2. Prove Windows desktop UI download for text and binary files placed in
RAP_Transfers\ToClient.
3. Prove rendering, input, clipboard, upload, lifecycle, and fallback do not
regress.
Acceptance:
- disabled and client_to_server modes block download
- server_to_client and bidirectional modes allow download
- text and binary files download with matching hashes
- traversal/symlink/non-regular/too-large files are blocked
- rendering, input, clipboard, upload, lifecycle, and fallback do not regress
```
## 12. Bottom Line
The project direction is sound, but the process must now become stricter:
- design channel semantics first
- implement through adapter boundaries
- prove with live/manual smoke and automated gates
- update source-of-truth docs before starting the next major stage
- reject "temporary" shortcuts unless they have a documented removal condition
The RDP Adapter experience was expensive, but useful. It showed exactly where
the architecture must be disciplined before adding SSH, VNC, VPN, mobile
clients, or mesh runtime.