Initial project snapshot
This commit is contained in:
@@ -0,0 +1,221 @@
|
||||
# Current Baseline Matrix
|
||||
|
||||
Date: 2026-04-26
|
||||
|
||||
Purpose: single operational snapshot of the current project baseline. This file
|
||||
is not a target architecture document. It describes what is currently proven,
|
||||
what is merely implemented, and what remains unproven.
|
||||
|
||||
## Environment
|
||||
|
||||
Canonical test environment:
|
||||
|
||||
```text
|
||||
Docker host: 192.168.200.61
|
||||
SSH alias: docker-test
|
||||
Docker endpoint: ssh://docker-test
|
||||
Docker context: test-ubuntu
|
||||
Backend API: http://192.168.200.61:8080/api/v1
|
||||
Backend gateway: ws://192.168.200.61:8080/api/v1/gateway/ws
|
||||
```
|
||||
|
||||
Current live/smoke containers:
|
||||
|
||||
| Container | Image | Role |
|
||||
| --- | --- | --- |
|
||||
| `rap_backend_smoke` | `rap-backend-smoke:stage5-2-download` | backend control plane |
|
||||
| `rap_worker_smoke` | `rap-rdp-worker:stage5-2-download` | accepted RDP Adapter worker baseline plus runtime-proven Stage 5.2 core download path |
|
||||
| `rap_postgres` | `postgres:16` | source-of-truth database |
|
||||
| `rap_redis` | `redis:7` | live coordination/routing |
|
||||
|
||||
Current Windows client endpoints:
|
||||
|
||||
```json
|
||||
{
|
||||
"api_base_url": "http://192.168.200.61:8080/api/v1",
|
||||
"gateway_websocket_url": "ws://192.168.200.61:8080/api/v1/gateway/ws",
|
||||
"prefer_direct_data_plane": true,
|
||||
"direct_data_plane_connect_timeout_ms": 2500,
|
||||
"direct_data_plane_color_mode": "full_color",
|
||||
"direct_data_plane_platform_ca_bundle": "artifacts/p3-5-platform-ca.crt",
|
||||
"environment": "production",
|
||||
"allow_insecure_direct_data_plane_tls_for_smoke": false
|
||||
}
|
||||
```
|
||||
|
||||
## Build And Probe Snapshot
|
||||
|
||||
Commands run during P0:
|
||||
|
||||
```powershell
|
||||
go test ./...
|
||||
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-graphics-adapter-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-cursor-adapter-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-service-adapter-protocol-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-dataplane-bind-probe --scenario valid
|
||||
```
|
||||
|
||||
Additional accepted P1 baseline checks:
|
||||
|
||||
```powershell
|
||||
go test ./...
|
||||
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
|
||||
docker -H ssh://docker-test build --tag rap-rdp-worker:rdp-p1-region-order2 --file workers/rdp-worker/Dockerfile workers/rdp-worker
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-graphics-adapter-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-cursor-adapter-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-service-adapter-protocol-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-p1-region-order2 rdp-worker-dataplane-bind-probe --scenario valid
|
||||
```
|
||||
|
||||
Results:
|
||||
|
||||
| Check | Result | Notes |
|
||||
| --- | --- | --- |
|
||||
| Backend `go test ./...` | PASS | Most packages still have no test files |
|
||||
| Windows solution build | PASS | 0 warnings, 0 errors |
|
||||
| Worker graphics adapter probe | PASS | `graphics_adapter_probe ok` |
|
||||
| Worker cursor adapter probe | PASS | `cursor_adapter_probe ok` |
|
||||
| Worker service adapter protocol probe | PASS | channel model prints successfully |
|
||||
| Worker direct bind valid probe | PASS | `PASS scenario=valid` |
|
||||
| P1 worker image build | PASS | `rap-rdp-worker:rdp-p1-region-order2` |
|
||||
| P1 worker probes | PASS | graphics, cursor, protocol, direct bind |
|
||||
| P1 smoke-worker deployment | PASS | `rap_worker_smoke` online on test Docker |
|
||||
| P3 backend secret guard tests | PASS | production plaintext metadata rejected; dev/smoke allowed |
|
||||
| P3 data-plane policy test | PASS | allowed channels follow clipboard/file-transfer policy |
|
||||
| P3 worker bind denial probes | PASS | wrong worker/user/org/resource/attachment/channels/state rejected |
|
||||
| P3.3 production secret smoke | PASS | secret-backed RDP resource starts real session on test stand |
|
||||
| P3.3 production fallback smoke | PASS | production backend omits smoke-only direct WSS candidate |
|
||||
| P3.3 dev/smoke direct candidate | PASS | direct candidate is `smoke_only=true`, not production trusted |
|
||||
| P3.4 production WSS trust design | PASS | platform CA, certificate lifecycle, app-local trust, smoke plan documented |
|
||||
| P3.5 app-local platform CA smoke | PASS | direct worker WSS selected without insecure TLS bypass; unknown CA and smoke-only production fallback proved |
|
||||
| P3.6 stale worker event idempotency | PASS | backend restart survives stale Redis worker events; terminal PostgreSQL sessions stay terminal |
|
||||
| Stage 5.2 file download build | PASS | backend/worker/client build |
|
||||
| Stage 5.2 core download runtime | PASS | direct worker WSS and backend gateway text/binary size/hash; policy block for disabled/client_to_server |
|
||||
| Stage 5.2 download lifecycle blocking | PASS | detach blocks, old-controller takeover returns `session.taken_over`, worker failure marks session `failed` and closes direct WS |
|
||||
|
||||
Important limitation:
|
||||
|
||||
- this snapshot does not replace a live manual RDP smoke pass
|
||||
- the repository directory used for this audit is not currently a Git checkout,
|
||||
so commit-level provenance is unavailable here
|
||||
|
||||
## Feature Matrix
|
||||
|
||||
| Area | Status | Current proof level | Next action |
|
||||
| --- | --- | --- | --- |
|
||||
| Backend foundation | Implemented | build/test PASS | expand automated tests |
|
||||
| Auth/refresh/devices | Implemented | previous runtime proof | add regression tests |
|
||||
| Organization scope | Implemented | previous hardening pass | add cross-org tests |
|
||||
| Session lifecycle | Implemented | live-proven | protect from regression |
|
||||
| Worker registration/leases | Implemented | live-proven | protect from regression |
|
||||
| Worker-death recovery | Implemented | live-proven | add automated smoke |
|
||||
| Structured messaging/localization | Implemented | runtime-proven | protect from regression |
|
||||
| Direct worker WSS | Implemented | live-proven | preserve |
|
||||
| Backend gateway fallback | Implemented | smoke-proven | preserve |
|
||||
| Binary direct render | Implemented | smoke-proven | preserve |
|
||||
| RDP region-first render | Implemented | live/manual usable | harden artifacts |
|
||||
| Direct attach baseline | Implemented | current baseline | preserve |
|
||||
| Region-loss repair | Implemented | current baseline | diagnose remaining artifacts |
|
||||
| Ordered region delivery | Implemented | manual visual smoke accepted | protect |
|
||||
| RDPGFX | Gated only | default path smoke-proven | keep disabled |
|
||||
| Keyboard/mouse input | Implemented | manually usable | protect |
|
||||
| Cursor updates | Implemented | probe/smoke-proven | protect |
|
||||
| Text clipboard | Implemented | accepted | protect |
|
||||
| File upload | Implemented | accepted to worker storage | protect |
|
||||
| Restricted drive visibility | Implemented | runtime-proven via `RAP_Transfers` | protect |
|
||||
| File download | Implemented | core data path and lifecycle blocking runtime-proven; desktop UI proof pending | prove remaining UI next |
|
||||
| Resource secret readiness | Guard implemented | backend tests PASS | protect |
|
||||
| Encrypted secret resolver | MVP implemented | live smoke PASS on test stand | harden KMS/rotation later |
|
||||
| Direct worker WSS TLS/PKI guard | Guard implemented | production platform CA smoke PASS | preserve |
|
||||
| Stale worker event restart safety | Implemented | runtime smoke PASS | protect |
|
||||
| Node-agent runtime | Not implemented | control-plane foundation only | future |
|
||||
| Mesh/VPN/runtime | Not implemented | target architecture only | future |
|
||||
| SSH/VNC adapters | Not implemented | none | future after RDP |
|
||||
|
||||
## RDP Baseline
|
||||
|
||||
Current accepted RDP worker image:
|
||||
|
||||
```text
|
||||
rap-rdp-worker:rdp-p1-region-order2
|
||||
```
|
||||
|
||||
Previous accepted baseline image:
|
||||
|
||||
```text
|
||||
rap-rdp-worker:rdp-region-repair
|
||||
```
|
||||
|
||||
Current RDP render model:
|
||||
|
||||
- classic FreeRDP/GDI region-first BGRA path
|
||||
- direct worker WSS binary `RAP2` frames
|
||||
- backend gateway JSON/base64 fallback
|
||||
- full frame on connect/attach/baseline/recovery/fallback repair
|
||||
- dirty region updates as normal display path
|
||||
- cursor as independent latest-only channel
|
||||
- input highest priority
|
||||
- clipboard and file upload reliable/policy-gated
|
||||
|
||||
Current RDP known limitation:
|
||||
|
||||
- window drag uses old-client/slow-link style frame-only movement; repaint after
|
||||
releasing a moved window is usable but not yet polished
|
||||
|
||||
Current accepted P1 behavior:
|
||||
|
||||
- dirty-region updates are preserved in-order through `SessionRuntime`, worker
|
||||
direct WSS, Windows transport, and WPF presenter queues
|
||||
- full frames still supersede pending region queues
|
||||
- worker direct region queue overflow requests throttled full-frame repair
|
||||
- client logs region sequence gaps and regions received before a baseline
|
||||
- manual visual smoke accepted idle repaint, Start menu/hover, drag usability,
|
||||
keyboard, mouse, and session close
|
||||
|
||||
Current RDP non-goals:
|
||||
|
||||
- no DP-3B adaptive quality yet
|
||||
- no compression/codecs/tiles yet
|
||||
- no RDPGFX default enable
|
||||
- no full Stage 5.2 desktop UI acceptance yet
|
||||
- no UI redesign
|
||||
- no backend/session lifecycle rewrite
|
||||
|
||||
## Documentation Truth Status
|
||||
|
||||
Updated during P0:
|
||||
|
||||
- `README.md`
|
||||
- `README_START_HERE.md`
|
||||
- `docs/codex/CURRENT_STATUS.md`
|
||||
- `docs/codex/NEXT_STEP_PROMPT.md`
|
||||
- `clients/windows/README.md`
|
||||
- `workers/rdp-worker/README.md`
|
||||
- `docs/architecture/DATA_PLANE_V1.md`
|
||||
- `docs/architecture/RDP_ADAPTER_RUNTIME.md`
|
||||
- `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md`
|
||||
- `docs/architecture/RDP_FILE_DOWNLOAD_STAGE_5_2.md`
|
||||
- `docs/audits/CURRENT_BASELINE_MATRIX.md`
|
||||
|
||||
Current authoritative audit:
|
||||
|
||||
- `docs/audits/PROJECT_AUDIT_2026-04-26.md`
|
||||
|
||||
Legacy warning:
|
||||
|
||||
- `docs/_legacy_v1` is historical reference only and must not be used for
|
||||
implementation decisions
|
||||
|
||||
## Correct Next Step
|
||||
|
||||
Proceed with Stage 5.2 remaining live runtime proof - Server-to-Client File
|
||||
Download:
|
||||
|
||||
- keep `rap-backend-smoke:stage5-2-download` and
|
||||
`rap-rdp-worker:stage5-2-download` deployed on `docker-test`
|
||||
- prove Windows desktop UI download for files placed in `RAP_Transfers\ToClient`
|
||||
- prove rendering/input/clipboard/upload/reconnect/takeover regressions
|
||||
- keep backend gateway fallback active
|
||||
- do not start arbitrary remote path download, SMB/WebDAV, Windows agent,
|
||||
binary file chunk frames, DP-3B, mesh/VPN, node-agent runtime, or new adapters
|
||||
@@ -0,0 +1,662 @@
|
||||
# Project Audit And Next-Step Plan
|
||||
|
||||
Date: 2026-04-26
|
||||
|
||||
Status: documentation/audit only. No runtime behavior is changed by this
|
||||
document.
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The project is no longer just an RDP proxy. The correct target is a Secure
|
||||
Access Fabric platform with a control plane, direct realtime data plane,
|
||||
service adapters, tenant isolation, and future node/mesh/VPN capabilities.
|
||||
|
||||
The implementation has reached a much more advanced state than several
|
||||
operational documents describe. The most important current risk is therefore
|
||||
not only code quality. It is source-of-truth drift: old prompts and READMEs can
|
||||
send the next stage in the wrong direction.
|
||||
|
||||
The RDP MVP has proven the hard lifecycle assumptions:
|
||||
|
||||
- real RDP connection through the worker works
|
||||
- active/detach/reattach/takeover/terminate flows are proven
|
||||
- takeover does not recreate the remote session
|
||||
- worker-death/orphan-active-session recovery is proven
|
||||
- Windows client can render and control a real remote desktop
|
||||
- direct worker WSS data plane is implemented and used
|
||||
- binary render frames are implemented on direct data plane
|
||||
- backend gateway JSON/base64 path remains available as fallback/debug
|
||||
- ordered dirty-region delivery is accepted as the current RDP baseline
|
||||
- text clipboard is implemented and accepted
|
||||
- client-to-server file upload to worker-controlled storage is accepted
|
||||
- restricted drive visibility is runtime-proven: uploaded files are visible and
|
||||
openable inside the remote Windows session through `RAP_Transfers`
|
||||
|
||||
The RDP adapter lesson is clear: "make it simple first and patch later" is
|
||||
dangerous for realtime protocols. Full-frame polling, implicit refresh after
|
||||
input, and backend/Redis realtime relaying worked for proof, but they caused
|
||||
the exact class of latency and correctness issues we later had to unwind. From
|
||||
this point forward, each service adapter must be specified as an event-driven
|
||||
adapter before implementation.
|
||||
|
||||
Recommended immediate priority:
|
||||
|
||||
1. Freeze and document the current working baseline.
|
||||
2. Synchronize stale project docs with the real state.
|
||||
3. Preserve the accepted RDP visual correctness/stability baseline.
|
||||
4. Preserve the accepted Stage 5.1.1 restricted drive visibility behavior.
|
||||
5. Add automated regression gates so manual discoveries become repeatable tests.
|
||||
|
||||
## 2. Audit Method
|
||||
|
||||
This audit used the current filesystem state in:
|
||||
|
||||
```text
|
||||
\\192.168.220.200\mst\codex\rdp-proxy
|
||||
```
|
||||
|
||||
Important environment note:
|
||||
|
||||
- the directory is not currently a Git checkout (`git status` reports that no
|
||||
`.git` repository exists), so this audit cannot use commit history
|
||||
- the canonical test Docker host is `docker-test` / `192.168.200.61`
|
||||
- the live test stack currently contains `rap_backend_smoke`, `rap_worker_smoke`,
|
||||
`rap_postgres`, and `rap_redis`
|
||||
|
||||
Commands run during this audit:
|
||||
|
||||
```powershell
|
||||
go test ./...
|
||||
dotnet build .\clients\windows\RemoteAccessPlatform.Windows.slnx
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-graphics-adapter-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-cursor-adapter-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-service-adapter-protocol-probe
|
||||
docker -H ssh://docker-test run --rm rap-rdp-worker:rdp-region-repair rdp-worker-dataplane-bind-probe --scenario valid
|
||||
```
|
||||
|
||||
Results:
|
||||
|
||||
- backend tests: PASS
|
||||
- Windows client build: PASS, 0 warnings, 0 errors
|
||||
- worker graphics adapter probe: PASS
|
||||
- worker cursor adapter probe: PASS
|
||||
- worker service adapter protocol probe: PASS
|
||||
- worker data-plane bind valid probe: PASS
|
||||
|
||||
Coverage warning:
|
||||
|
||||
- most backend modules still report `[no test files]`
|
||||
- much of the current confidence comes from smoke/manual proofs and logs
|
||||
- this is not enough for production readiness
|
||||
|
||||
## 3. Planned Direction
|
||||
|
||||
The authoritative long-term direction is:
|
||||
|
||||
- `CODEX_CONTEXT.md`
|
||||
- `docs/architecture/SECURE_ACCESS_FABRIC_TARGET.md`
|
||||
- `docs/architecture/DATA_PLANE_V1.md`
|
||||
- `docs/architecture/SERVICE_ADAPTER_PROTOCOL.md`
|
||||
- `docs/architecture/RDP_ADAPTER_RUNTIME.md`
|
||||
- `docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md`
|
||||
|
||||
The target platform model is:
|
||||
|
||||
```text
|
||||
Access Client
|
||||
-> Ingress / Data Plane
|
||||
-> Secure Fabric / Routing
|
||||
-> Service Adapter at egress edge
|
||||
-> Target service
|
||||
```
|
||||
|
||||
For RDP specifically:
|
||||
|
||||
```text
|
||||
Access Client
|
||||
<-> platform session/data-plane protocol
|
||||
RDP Adapter
|
||||
<-> FreeRDP / project-owned RDP internals
|
||||
RDP Server
|
||||
```
|
||||
|
||||
This naming should be kept consistent:
|
||||
|
||||
- Access Client: native Windows/iOS/Android/Linux client that speaks the
|
||||
platform protocol
|
||||
- Control Plane: backend API, auth, orgs, policy, session lifecycle, audit
|
||||
- Data Plane: realtime session traffic channels
|
||||
- Service Adapter: protocol translator for RDP/VNC/SSH/video/etc
|
||||
- RDP Adapter: current C++ RDP service adapter
|
||||
- Entry/Ingress Node: accepts client connections into the fabric
|
||||
- Egress/Service Node: reaches target resources and hosts adapters
|
||||
- Node Agent: native host identity, update, health, and service supervisor
|
||||
|
||||
## 4. What Is Implemented
|
||||
|
||||
### Backend
|
||||
|
||||
Implemented:
|
||||
|
||||
- Go backend foundation
|
||||
- PostgreSQL source-of-truth storage
|
||||
- Redis live coordination/routing
|
||||
- auth foundation
|
||||
- refresh token rotation
|
||||
- devices/trusted devices
|
||||
- org-scoped resources and sessions
|
||||
- platform-core v2 foundation
|
||||
- identity source foundation
|
||||
- node/node-agent control-plane foundation
|
||||
- session broker orchestration
|
||||
- worker coordination and stale worker monitoring
|
||||
- structured localization-ready messages
|
||||
- resource certificate verification policy
|
||||
- clipboard policy
|
||||
- file-transfer policy
|
||||
- data-plane token/candidate generation
|
||||
- backend gateway fallback
|
||||
|
||||
Key files:
|
||||
|
||||
- `backend/internal/modules/sessionbroker/service.go`
|
||||
- `backend/internal/modules/sessionbroker/orchestration.go`
|
||||
- `backend/internal/modules/sessionbroker/state_machine.go`
|
||||
- `backend/internal/modules/sessionbroker/dataplane.go`
|
||||
- `backend/internal/modules/sessiongateway/module.go`
|
||||
- `backend/internal/modules/worker/monitor.go`
|
||||
- `backend/internal/modules/resource/module.go`
|
||||
- `backend/internal/modules/auth/service.go`
|
||||
- `backend/internal/platform/httpx/message.go`
|
||||
- `backend/migrations/000005_platform_core_v2.up.sql`
|
||||
- `backend/migrations/000007_clipboard_policy_mode.up.sql`
|
||||
- `backend/migrations/000008_file_transfer_policy_mode.up.sql`
|
||||
|
||||
Known backend gaps:
|
||||
|
||||
- automated test coverage is thin outside `sessionbroker`
|
||||
- P3/P3.1 resource secret-readiness and encrypted resolver MVP exists;
|
||||
production mode rejects plaintext credential metadata and requires
|
||||
`secret_ref` for RDP/VNC/SSH resources
|
||||
- external KMS/Vault integration and master-key rotation are not implemented
|
||||
yet
|
||||
- admin/control UI for safe resource/policy management is not the current focus
|
||||
- node-agent runtime is not implemented; only control-plane foundation exists
|
||||
- identity source sync runtime is not implemented
|
||||
|
||||
### Windows Client
|
||||
|
||||
Implemented:
|
||||
|
||||
- WPF client skeleton and build
|
||||
- auth/login/refresh/logout foundation
|
||||
- organization selection
|
||||
- resource list
|
||||
- active sessions
|
||||
- session window
|
||||
- direct data-plane selection with fallback
|
||||
- binary render receive path
|
||||
- input capture/forwarding
|
||||
- cursor/render display
|
||||
- localization-ready resource layer
|
||||
- text clipboard UI/path
|
||||
- file upload UI/path
|
||||
- failed-session refresh after gateway close
|
||||
|
||||
Key files:
|
||||
|
||||
- `clients/windows/src/RemoteAccessPlatform.Windows.App/SessionWindow.xaml`
|
||||
- `clients/windows/src/RemoteAccessPlatform.Windows.Application/ViewModels/SessionWindowViewModel.cs`
|
||||
- `clients/windows/src/RemoteAccessPlatform.Windows.Transport/SessionGatewayClient.cs`
|
||||
- `clients/windows/src/RemoteAccessPlatform.Windows.App/Input/SessionInputMapper.cs`
|
||||
- `clients/windows/src/RemoteAccessPlatform.Windows.Application/Localization/Strings.cs`
|
||||
- `clients/windows/src/RemoteAccessPlatform.Windows.Application/Resources/Strings.resx`
|
||||
|
||||
Known client gaps:
|
||||
|
||||
- final UX polish is not complete
|
||||
- automated client regression tests are missing
|
||||
- manual RDP UX remains the acceptance authority for now
|
||||
- some README limitations are stale and understate what exists
|
||||
|
||||
### RDP Worker / RDP Adapter
|
||||
|
||||
Implemented:
|
||||
|
||||
- standalone C++ worker service
|
||||
- FreeRDP integration behind worker boundary
|
||||
- worker registration/assignment/lease lifecycle
|
||||
- direct worker WSS endpoint
|
||||
- RS256 data-plane token validation
|
||||
- direct bind policy and current attachment validation
|
||||
- JSON control/input/clipboard/file-upload envelopes
|
||||
- binary RAP2 render frames for direct path
|
||||
- backend gateway JSON/base64 fallback
|
||||
- region-first BGRA render path
|
||||
- direct attach baseline full-frame repair
|
||||
- region-loss full-frame repair throttle
|
||||
- cursor adapter boundary
|
||||
- text clipboard through FreeRDP `cliprdr`
|
||||
- client-to-server file upload
|
||||
- restricted visible transfer directory
|
||||
- restricted FreeRDP drive redirection groundwork
|
||||
|
||||
Key files:
|
||||
|
||||
- `workers/rdp-worker/src/main.cpp`
|
||||
- `workers/rdp-worker/src/runtime/session_runtime.cpp`
|
||||
- `workers/rdp-worker/include/rdp_worker/runtime/session_runtime.hpp`
|
||||
- `workers/rdp-worker/src/adapter/rdp_adapter_runtime.cpp`
|
||||
- `workers/rdp-worker/src/freerdp/rdp_runtime.cpp`
|
||||
- `workers/rdp-worker/src/dataplane/direct_wss_server.cpp`
|
||||
- `workers/rdp-worker/src/runtime/direct_bind_policy.cpp`
|
||||
- `workers/rdp-worker/include/rdp_worker/adapter/service_adapter_protocol.hpp`
|
||||
|
||||
Current live/smoke images:
|
||||
|
||||
```text
|
||||
rap-backend-smoke:stage5-2-download
|
||||
rap-rdp-worker:stage5-2-download
|
||||
```
|
||||
|
||||
Known worker/RDP gaps:
|
||||
|
||||
- drag/release repaint is usable but not polished; drag behaves like an older
|
||||
RDP client on a weak link by moving a frame rather than continuously
|
||||
repainting the full window
|
||||
- RDPGFX is gated and disabled by default because the current live target resets
|
||||
the connection when RDPGFX is advertised
|
||||
- encoded graphics/codecs/tiles are not production-accepted yet
|
||||
- file download core data path is runtime-proven through direct worker WSS and
|
||||
backend gateway fallback, and lifecycle blocking is runtime-proven for
|
||||
detach, old-controller takeover, and worker failure. Stage 5.2 is not fully
|
||||
runtime-accepted until Windows desktop UI download is proven
|
||||
- FreeRDP is still the substrate; replacing it is not justified until the
|
||||
adapter boundary proves which pieces are actually insufficient
|
||||
|
||||
## 5. Plan vs Fact Matrix
|
||||
|
||||
| Area | Planned | Current fact | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| Backend foundation | Go, config, HTTP, PostgreSQL, Redis | Implemented and builds | Done |
|
||||
| Auth | access/refresh flow, sessions, devices | Implemented | Done |
|
||||
| Session lifecycle | start/attach/detach/takeover/terminate/fail/recover | Live-proven earlier and preserved | Done, protect |
|
||||
| Multi-tenancy | organizations and org-scoped resources/sessions | Implemented | Done, needs more tests |
|
||||
| Authorization | platform/admin/member boundaries | Implemented foundation | Needs broader tests |
|
||||
| Worker coordination | registration, lease, stale recovery | Implemented and live-proven | Done, protect |
|
||||
| Windows client MVP | native WPF client | Implemented and builds | Done |
|
||||
| Localization messaging | structured backend/client messaging | Implemented and runtime-proven earlier | Done, protect |
|
||||
| Direct data plane | client-to-worker WSS | Implemented | Done |
|
||||
| Binary render | direct binary render, fallback JSON/base64 | Implemented | Done |
|
||||
| RDP adapter event model | event-driven adapter boundary | Implemented and P1 accepted | Done, protect |
|
||||
| RDP render quality | grayscale foundation | Implemented | Partial |
|
||||
| RDPGFX/encoded graphics | future performance path | gated only, not accepted | Not production |
|
||||
| Clipboard | text-only, policy-gated | Accepted | Done |
|
||||
| File upload | client-to-server to worker storage | Accepted | Done |
|
||||
| File visibility in RDP | restricted drive redirection | Accepted via `RAP_Transfers` | Done, protect |
|
||||
| File download | server-to-client | Core and lifecycle runtime-proven, desktop UI proof pending | Prove UI next |
|
||||
| Mesh/VPN/multi-cluster runtime | target architecture only | Not implemented | Correctly deferred |
|
||||
| Node-agent runtime/updater | target/foundation only | Not implemented | Future |
|
||||
| Identity sync runtime | LDAP/OIDC sync | Not implemented | Future |
|
||||
|
||||
## 6. Important Source-Of-Truth Drift
|
||||
|
||||
At the start of this audit these files were stale or partly stale:
|
||||
|
||||
- `README.md` still points to old legacy docs and says not to start with UI,
|
||||
while the Windows client already exists
|
||||
- `docs/codex/CURRENT_STATUS.md` says WebSocket takeover proof is still a gap,
|
||||
even though that proof was later closed
|
||||
- `docs/codex/NEXT_STEP_PROMPT.md` previously pointed to platform-core v2 as
|
||||
the next step, although platform-core v2 already exists
|
||||
- `clients/windows/README.md` still says it intentionally stops short of final
|
||||
viewer rendering, but the client now renders the remote desktop
|
||||
- `workers/rdp-worker/README.md` documented recent RDP stages, but previously
|
||||
did not clearly mark the current accepted image and latest manual acceptance
|
||||
- `docs/architecture/DATA_PLANE_V1.md` previously had a stale "Next
|
||||
Implementation Prompt"; it now points to Stage 5.2 live runtime proof
|
||||
- `docs/architecture/RDP_ADAPTER_RUNTIME.md` and
|
||||
`docs/architecture/RDP_SERVICE_CPP_PERFORMANCE_TARGET.md` still mark manual UX
|
||||
acceptance as pending before the latest fixes
|
||||
|
||||
This was the P0 risk addressed by the baseline-freeze documentation pass. Future
|
||||
stages must keep these files current after every accepted runtime change so a
|
||||
future Codex/session cannot follow an old prompt and reintroduce
|
||||
already-rejected architecture.
|
||||
|
||||
## 7. Lessons From The RDP Adapter Work
|
||||
|
||||
The RDP work exposed several project-level rules:
|
||||
|
||||
1. Realtime protocol features must be designed as channel semantics first.
|
||||
Input, display, cursor, clipboard, file transfer, and telemetry cannot share
|
||||
one undifferentiated queue.
|
||||
|
||||
2. Backend/Redis must not be the production realtime path. It is correct as
|
||||
fallback/debug/control-plane glue, not for high-rate render.
|
||||
|
||||
3. Full-frame rendering is not the normal production model. It is needed for
|
||||
baseline, attach, resize, recovery, and fallback repair.
|
||||
|
||||
4. Dirty regions cannot be blindly latest-only without a repair strategy.
|
||||
Dropping a region update may leave visible artifacts; the current
|
||||
`region_loss_repair` full-frame repair is a pragmatic safety net.
|
||||
|
||||
5. Server-origin events must drive display updates. Remote changes must not
|
||||
depend on local mouse/keyboard events.
|
||||
|
||||
6. Input must be independent from render. A key or click must never wait behind
|
||||
a frame, upload chunk, clipboard message, or lease renewal.
|
||||
|
||||
7. FreeRDP is not the problem by default. The earlier problem was how we pumped
|
||||
events, scheduled frames, relayed payloads, and treated screen updates. The
|
||||
correct direction is an adapter boundary around FreeRDP first, not a full
|
||||
rewrite before we can prove the replacement.
|
||||
|
||||
8. Manual UX proof is essential. Automated input can pass while real user input
|
||||
feels wrong.
|
||||
|
||||
9. Every "temporary" shortcut needs an explicit expiration condition. If it does
|
||||
not have one, it becomes architecture.
|
||||
|
||||
## 8. What We May Have Missed
|
||||
|
||||
These are not immediate bugs, but they should be addressed early because they
|
||||
shape the product:
|
||||
|
||||
- RDP server compatibility matrix: Windows Server versions, NLA modes, GDI vs
|
||||
RDPGFX behavior, color depth, TLS/cert behavior, domain login variants
|
||||
- weak-channel simulation: latency, jitter, loss, constrained bandwidth
|
||||
- high-concurrency session model: many users, many workers, CPU/network limits
|
||||
- deterministic smoke reports: every accepted stage should leave reproducible
|
||||
artifacts and commands
|
||||
- secret management: credentials must move out of plain resource metadata
|
||||
- production PKI: direct worker WSS currently uses smoke-friendly TLS handling
|
||||
on the client side
|
||||
- authorization tests: cross-org denial paths need automated coverage
|
||||
- resource policy test matrix: clipboard/file/cert/session policies
|
||||
- file transfer threat model: filename normalization, symlink escape, overwrite
|
||||
behavior, quotas, cleanup, audit
|
||||
- observability: per-channel latency, frame drops, input latency, worker event
|
||||
pump health, adapter callback counters
|
||||
- client UI state machine tests: close/dispose, failed state, reconnect,
|
||||
takeover, detach, old attachment blocking
|
||||
- upgrade/rollback story: node-agent target exists, runtime is not implemented
|
||||
- deployment topology: container host networking vs Docker bridge/NAT for
|
||||
realtime workloads
|
||||
- service adapter conformance suite: RDP now has a pattern that VNC/SSH/video
|
||||
should follow
|
||||
|
||||
## 9. Architectural Decisions To Freeze Now
|
||||
|
||||
These decisions should be treated as current project rules:
|
||||
|
||||
1. PostgreSQL is source of truth.
|
||||
2. Redis is live coordination/routing only.
|
||||
3. Backend is control plane, not production render relay.
|
||||
4. Direct data plane is preferred for realtime RDP traffic.
|
||||
5. Backend gateway remains fallback/debug until direct path is fully mature.
|
||||
6. Service adapters translate external protocols to platform channels.
|
||||
7. RDP Adapter remains C++ and FreeRDP-backed for now.
|
||||
8. FreeRDP details must not leak into backend or Access Client business logic.
|
||||
9. Access Client speaks platform protocol, not RDP.
|
||||
10. Mesh/VPN/multi-cluster/node-agent runtime remain future staged work.
|
||||
11. RDP must be stabilized before adding VNC/SSH/VPN/product expansion.
|
||||
12. No new feature should start while source-of-truth docs are stale.
|
||||
|
||||
## 10. Recommended Next Stages
|
||||
|
||||
### P0. Truth And Baseline Freeze
|
||||
|
||||
Goal: make the current working system impossible to misunderstand.
|
||||
|
||||
Do:
|
||||
|
||||
- update root `README.md`
|
||||
- update `docs/codex/CURRENT_STATUS.md`
|
||||
- update `docs/codex/NEXT_STEP_PROMPT.md`
|
||||
- update `clients/windows/README.md`
|
||||
- update `workers/rdp-worker/README.md`
|
||||
- update `docs/architecture/DATA_PLANE_V1.md` next prompt
|
||||
- update `docs/architecture/RDP_ADAPTER_RUNTIME.md` with latest baseline/region
|
||||
repair status
|
||||
- document current test Docker image/tag and startup commands
|
||||
- preserve the accepted RDP worker baseline
|
||||
- create one "current smoke matrix" document
|
||||
|
||||
Do not:
|
||||
|
||||
- add features
|
||||
- start DP-3B
|
||||
- start server-to-client download
|
||||
- start mesh/VPN/node-agent runtime
|
||||
|
||||
Acceptance:
|
||||
|
||||
- a new engineer/Codex can read the docs and know the actual next step
|
||||
- no doc points to legacy v1 or already-completed stages as next work
|
||||
|
||||
### P1. RDP Visual Correctness Hardening
|
||||
|
||||
Goal: eliminate remaining small artifacts without returning to slow full-frame
|
||||
rendering.
|
||||
|
||||
Do:
|
||||
|
||||
- add explicit region sequence/gap diagnostics
|
||||
- prove when artifacts happen: region drop, stale region ordering, missed server
|
||||
callback, client application bug, or repair interval issue
|
||||
- verify client applies region frames to the correct bitmap area and stride
|
||||
- keep baseline full frame on attach
|
||||
- keep full repair only on loss/recovery, not as normal render loop
|
||||
- collect before/after screenshots/logs
|
||||
|
||||
Do not:
|
||||
|
||||
- enable RDPGFX globally
|
||||
- add compression/tiles/codecs before correctness is stable
|
||||
- change backend/session lifecycle
|
||||
|
||||
Acceptance:
|
||||
|
||||
- remote idle updates repaint without local input
|
||||
- Start menu/task manager/window movement leave no persistent artifacts
|
||||
- input and close behavior remain usable
|
||||
|
||||
### P2. Stage 5.1.1 Restricted Drive Visibility Proof
|
||||
|
||||
Status: accepted as runtime-proven on the test Docker stand.
|
||||
|
||||
Goal: keep the upload visibility path protected while the RDP Adapter continues
|
||||
to be hardened.
|
||||
|
||||
Do:
|
||||
|
||||
- run live smoke with current RDP adapter baseline
|
||||
- upload file from Windows client
|
||||
- verify file appears in `\\tsclient\RAP_Transfers`
|
||||
- open text and binary files inside the remote Windows session
|
||||
- prove disabled policy blocks upload
|
||||
- prove takeover/detach/failure block old or invalid upload
|
||||
- verify directory cleanup on terminate
|
||||
|
||||
Do not:
|
||||
|
||||
- implement download
|
||||
- expose arbitrary worker filesystem
|
||||
- implement shared folders or SMB/WebDAV
|
||||
|
||||
Accepted proof:
|
||||
|
||||
- uploaded file is visible and openable inside remote Windows
|
||||
- only per-session visible directory is exposed
|
||||
- worker logs show `RAP_Transfers` configured as the only redirected drive
|
||||
- termination cleans the per-session transfer directory
|
||||
|
||||
### P3. Security And Secrets Readiness
|
||||
|
||||
Status: P3.1 MVP complete; production TLS/PKI remains P3.2.
|
||||
|
||||
Goal: remove proof-stage security shortcuts before broad usage.
|
||||
|
||||
Completed:
|
||||
|
||||
- documented secret-reference model in
|
||||
`docs/architecture/SECURITY_SECRETS_READINESS.md`
|
||||
- production mode rejects plaintext credential-like resource metadata
|
||||
- production RDP/VNC/SSH resources require `secret_ref`
|
||||
- session start rejects legacy plaintext resources in production mode
|
||||
- data-plane allowed-channel policy test exists
|
||||
- worker direct-bind denial probes cover wrong worker/user/org/resource,
|
||||
wrong attachment, over-broad channels, and failed/terminated states
|
||||
- encrypted PostgreSQL-backed `resource_secrets` store exists
|
||||
- resource secret create/rotate endpoint updates `resources.secret_ref` without
|
||||
returning plaintext
|
||||
- session assignment resolves `secret_ref` after organization/resource/session/
|
||||
worker/lease checks and does not mutate `remote_sessions.metadata` with
|
||||
plaintext
|
||||
- secret access/access-denied/rotation audit events exist
|
||||
- direct worker WSS TLS trust metadata/guard exists; production backend omits
|
||||
smoke-only direct candidates and production Windows client skips untrusted
|
||||
direct candidates
|
||||
|
||||
Still required after P3.2:
|
||||
|
||||
- deploy production direct-worker certificates/platform CA trust
|
||||
- add external KMS/Vault or stronger key-management integration
|
||||
- add master-key rotation/re-encryption workflow
|
||||
- consider future worker pull/token resolver flow to avoid resolved credentials
|
||||
in Redis assignment payloads
|
||||
|
||||
Do not:
|
||||
|
||||
- build full enterprise KMS prematurely
|
||||
- weaken certificate or token model for convenience
|
||||
|
||||
Acceptance:
|
||||
|
||||
- production mode cannot create/start resources with plaintext credential
|
||||
metadata
|
||||
- cross-org, old-attachment, wrong worker/resource/org, and terminal-session
|
||||
denial paths are covered by focused tests/probes
|
||||
|
||||
### P4. Automated Regression Suite
|
||||
|
||||
Goal: convert the painful manual discoveries into repeatable gates.
|
||||
|
||||
Do:
|
||||
|
||||
- add backend unit/integration tests for org scope, session state, data-plane
|
||||
token, stale worker, clipboard/file policies
|
||||
- add worker probes for render sequencing, direct baseline, region repair,
|
||||
adapter event routing
|
||||
- add Windows transport/viewmodel tests for fallback, close/dispose, failed
|
||||
state, frame latest-only, localization resolution
|
||||
- make smoke scripts emit machine-readable PASS/FAIL reports
|
||||
- pin each accepted image/build artifact
|
||||
|
||||
Acceptance:
|
||||
|
||||
- a regression in input, render, worker-death, takeover, clipboard, or upload
|
||||
fails a repeatable test before manual smoke
|
||||
|
||||
### P5. RDP Performance Next Layer
|
||||
|
||||
Goal: improve speed on weak channels after correctness is stable.
|
||||
|
||||
Candidate paths:
|
||||
|
||||
- RDPGFX on compatible target only
|
||||
- encoded graphics payloads
|
||||
- dirty-region compression
|
||||
- tile/region framing
|
||||
- adaptive quality profiles
|
||||
- palette/grayscale/low-bandwidth modes
|
||||
- per-channel QoS and backpressure telemetry
|
||||
|
||||
Do not:
|
||||
|
||||
- replace stable region-first path without fallback
|
||||
- ship a graphics mode that only works on one target
|
||||
|
||||
Acceptance:
|
||||
|
||||
- direct full-color baseline remains available
|
||||
- each new graphics mode has compatibility detection and fallback
|
||||
|
||||
### P6. Product Completion For RDP
|
||||
|
||||
Only after P0-P5 gates are stable:
|
||||
|
||||
- manual desktop acceptance for server-to-client file download from
|
||||
`RAP_Transfers\ToClient`
|
||||
- richer file transfer UX
|
||||
- final RDP UX polish
|
||||
- policy management UI
|
||||
- operational runbooks
|
||||
- release readiness checklist
|
||||
|
||||
### P7. Platform Expansion
|
||||
|
||||
Only after RDP is stable:
|
||||
|
||||
- VNC Adapter
|
||||
- SSH Adapter
|
||||
- node-agent runtime/updater
|
||||
- entry/relay nodes
|
||||
- mesh routing
|
||||
- VPN/IP tunnel mode
|
||||
- Linux/iOS/Android clients
|
||||
|
||||
## 11. Proposed Immediate Next Prompt
|
||||
|
||||
Use this as the next implementation prompt if we continue immediately:
|
||||
|
||||
```text
|
||||
Proceed with Stage 5.2 remaining desktop UI proof only - RDP server-to-client
|
||||
file download.
|
||||
|
||||
Goal:
|
||||
Finish acceptance of safe, policy-aware download from the remote RDP session to
|
||||
the Windows Access Client UI using the restricted RAP_Transfers\ToClient drop
|
||||
zone.
|
||||
|
||||
Strict rules:
|
||||
- do not implement arbitrary remote path download
|
||||
- do not implement remote filesystem browser
|
||||
- do not implement recursive folder transfer
|
||||
- do not implement SMB/WebDAV/Windows agent
|
||||
- do not expose any worker path outside the per-session visible directory
|
||||
- do not change RDP rendering/input/clipboard behavior
|
||||
- do not remove backend gateway fallback
|
||||
- do not implement binary file chunk frames yet
|
||||
- do not start DP-3B, mesh, VPN, node-agent runtime, or new adapters
|
||||
|
||||
Scope:
|
||||
1. Keep the current Stage 5.2 backend/worker deployment on docker-test.
|
||||
2. Prove Windows desktop UI download for text and binary files placed in
|
||||
RAP_Transfers\ToClient.
|
||||
3. Prove rendering, input, clipboard, upload, lifecycle, and fallback do not
|
||||
regress.
|
||||
|
||||
Acceptance:
|
||||
- disabled and client_to_server modes block download
|
||||
- server_to_client and bidirectional modes allow download
|
||||
- text and binary files download with matching hashes
|
||||
- traversal/symlink/non-regular/too-large files are blocked
|
||||
- rendering, input, clipboard, upload, lifecycle, and fallback do not regress
|
||||
```
|
||||
|
||||
## 12. Bottom Line
|
||||
|
||||
The project direction is sound, but the process must now become stricter:
|
||||
|
||||
- design channel semantics first
|
||||
- implement through adapter boundaries
|
||||
- prove with live/manual smoke and automated gates
|
||||
- update source-of-truth docs before starting the next major stage
|
||||
- reject "temporary" shortcuts unless they have a documented removal condition
|
||||
|
||||
The RDP Adapter experience was expensive, but useful. It showed exactly where
|
||||
the architecture must be disciplined before adding SSH, VNC, VPN, mobile
|
||||
clients, or mesh runtime.
|
||||
Reference in New Issue
Block a user