Refactor RDP proxy handling and update related tests

This commit is contained in:
2026-05-17 20:38:35 +03:00
parent 8e9402580f
commit d551e57fd5
172 changed files with 22117 additions and 2509 deletions
@@ -1,13 +1,13 @@
# Web Ingress and Admin UI Model
Status: target architecture clarification. Documentation only.
Status: target architecture and implementation contract.
This document defines how HTTP/HTTPS web entry, Admin UI, dynamic page
composition, and cluster configuration responsibilities are separated in the
Secure Access Fabric.
It does not implement code, APIs, UI pages, mesh runtime, VPN runtime, or RDP
changes.
The fabric node-to-node transport remains QUIC-only. HTTP/HTTPS is allowed only
as an external client-facing service edge.
## Purpose
@@ -16,33 +16,41 @@ The platform needs a clear distinction between:
- Web Service as the HTTP/HTTPS entry layer
- Control Plane as the owner of cluster configuration and policy
- Admin UI as a safe, scoped user interface over Control Plane APIs
- Fabric Transport as the internal QUIC-only node-to-node substrate
The Web layer must never become the owner of cluster state, policy, topology,
secrets, node identity, or routing authority.
## Layer Ownership
### Web Service / Web Ingress
### Public HTTPS Ingress
Web Service is an edge service.
Public HTTPS Ingress is an edge service. It may run on a public Internet node,
including a small/slow node intended only to accept browser traffic and pass it
into the fabric.
Suggested role names:
Role names:
- `web-ingress`
- `admin-web-entry`
- `admin-web-shell`
- `public-ingress`
- `admin-ingress`
Responsibilities:
- accept HTTP/HTTPS
- listen on TCP `80` only for ACME challenges, health checks, and HTTPS
redirects
- listen on TCP `443` for browser/API HTTPS
- terminate TLS or sit behind the approved TLS terminator
- serve Admin UI shell/static assets
- proxy browser/API traffic to Control API
- serve only approved static UI shells and safe public metadata
- validate SNI/Host, request size, rate limits, and edge policy
- map the request to an allowed platform, cluster, organization, or user portal
scope
- forward accepted traffic into the fabric through an authorized fabric service
channel
- apply edge controls such as headers, rate limits, request size limits, and
future WAF rules
- expose only approved public/admin endpoints
Web Service must not:
Public HTTPS Ingress must not:
- own cluster configuration
- directly mutate PostgreSQL
@@ -51,6 +59,39 @@ Web Service must not:
- store node identity or certificates as source of truth
- expose internal mesh topology to browser clients
- execute cluster decisions locally
- hold platform/global admin authority keys
- infer authorization from the fact that it accepted TCP `443`
- become a general relay for arbitrary HTTP inside the fabric
The node that accepts HTTPS is not the node that automatically owns or executes
admin logic. It is only a service edge.
### Fabric Transport
Fabric Transport is the internal node-to-node layer.
Rules:
- node-to-node traffic uses QUIC only
- no HTTP fallback between fabric nodes
- STUN/ICE/rendezvous/relay are fabric transport mechanisms, not browser/API
protocols
- any service traffic accepted on `443` is converted into a scoped fabric
service channel before it crosses the mesh
- direct links, relay links, and route-health observations must remain separate
in diagnostics
- a fabric route proves reachability, not administrative authority
If a public ingress receives a request for an admin surface, the request flow is:
```text
Browser HTTPS
-> public/admin ingress on 443
-> tenant/cluster/platform scope selection
-> signed fabric service channel over QUIC
-> authorized admin/runtime service node
-> Control Plane authorization and policy
```
### Control Plane
@@ -77,9 +118,23 @@ only.
Cluster configuration is changed only through Control Plane services and APIs.
The Web layer is a presentation and ingress layer over those APIs.
### Admin UI
### Admin UI Runtime
Admin UI is a client application served through Web Ingress.
Admin UI Runtime is the service that serves and executes the admin surface. It
may run on any node explicitly assigned the matching runtime role.
Role names:
- `global-admin-runtime`
- `cluster-admin-runtime`
- `organization-portal-runtime`
- `user-portal-runtime`
- `identity-runtime`
- `policy-authority`
- `audit-sink`
Admin UI is a client application served through Public HTTPS Ingress or Admin UI
Runtime according to deployment policy.
It renders safe Control Plane projections and submits user actions to Control
Plane APIs.
@@ -95,7 +150,7 @@ Admin UI must not:
viewer
- contain executable cluster logic
## Admin Endpoint Placement
## Admin Endpoint Placement And Trust
Admin UI endpoint placement is explicit and must not be inferred from storage.
@@ -110,6 +165,8 @@ Scopes:
- Organization Admin Panel: tenant-safe projection for one organization. It
must expose only allowed resources, service endpoints, sessions, policies,
and safe status.
- User Portal: personal/account scope. It must expose only the authenticated
user's resources, sessions, devices, and profile actions.
Rules:
@@ -118,19 +175,29 @@ Rules:
- Storage nodes distribute/cache scoped configuration and snapshots only.
- Admin/web ingress is a separate service role and requires explicit Control
Plane assignment.
- Public Internet ingress is not enough to run a global panel.
- `global-admin-runtime`, `policy-authority`, and `audit-sink` may run only on
platform-owner trusted nodes.
- `cluster-admin-runtime` may run only on nodes authorized for that cluster.
- `organization-portal-runtime` and `user-portal-runtime` may run on broader
infrastructure, but they receive only scoped projections.
- Cluster-local admin endpoints require valid TLS/cert policy, signed scoped
snapshots, current node health, and sufficient role coverage.
- Platform Owner Console remains the owner-level view even when cluster-local
admin endpoints exist.
- Organization Admin Panel must never expose intermediate mesh topology,
storage shards, peer caches, route caches, or unrelated cluster data.
- A request entering through an organization-bound ingress must be rejected if it
asks for another organization, another cluster outside its contract, global
topology, or platform-owner data.
## Request Flow
```text
Admin Browser
-> Web Ingress / Admin Web Shell
-> Control API
-> Public/Admin HTTPS Ingress
-> Fabric Service Channel over QUIC
-> Admin UI Runtime / Control API
-> PostgreSQL source of truth
-> signed scoped snapshots / config distribution
-> rap-node-agent
@@ -266,6 +333,18 @@ Organization admin must not see:
- secrets
- unrelated cluster internals
Ingress-bound projections:
- A platform-owner ingress may expose platform navigation only after platform
authorization, MFA/step-up, and policy checks.
- A cluster-bound ingress may expose only that cluster's admin surface and
cluster-scoped safe diagnostics.
- An organization-bound ingress may expose only the organization projection and
organization-safe service endpoints.
- A user portal ingress may expose only the user's personal/account projection.
- Host/SNI alone is not authorization; it only selects the maximum possible
projection before server-side authorization narrows it further.
## Service Adapter UI Extensions
Service adapters may need configuration UI.
@@ -361,22 +440,258 @@ High-risk actions include:
## Deployment Model
### Current Test Entry
The current shared Docker test stand exposes the Platform Owner Control Panel at
`http://docker-test.cin.su:18080/` (`http://192.168.200.61:18080/`). This is a
temporary lab HTTP edge served by `rap_web_admin` from
`/tmp/rap-web-admin/html` on `test-docker`.
This entry is not the production authority model. It is allowed only for the
shared test stand while the HTTPS admin-ingress runtime is being completed. The
target production entry is:
```text
Browser HTTPS on 443
-> node with explicit admin-ingress/public-ingress role
-> signed web-ingress envelope
-> QUIC fabric service channel
-> authorized admin/portal runtime node
-> Control API projection/authorization
```
The browser-facing ingress may be a small public node, but it must not become
the management authority. Platform/global admin runtime remains limited to
platform-owner trusted nodes. Cluster, organization, and user panels receive
only their scoped projections.
The legacy Fabric map with separate `inputs`, `cluster nodes`, and `egress
zones` is retired for the transport-layer view. The Fabric panel must show
actual direct/fresh QUIC neighbor links, one-way/passive direction, stale/problem
state, relay/route-health annotations, and web-ingress runtime readiness. It
must not render old entry/egress zone columns as if they were transport
topology.
Possible deployment modes:
- Web Ingress and Control API in the same deployment for small/test installs
- Public/Admin HTTPS Ingress and Control API in the same deployment for
small/test installs
- Web Ingress separated from Control API for production
- multiple Web Ingress nodes for regional/admin access
- Web Ingress behind Caddy/Nginx/enterprise ingress
- Admin UI shell served from Web Ingress while APIs remain on Control API
- Internet ingress on a low-capacity node that forwards scoped channels to a
trusted admin runtime elsewhere in the fabric
- global admin runtime only on platform-owner controlled nodes
- cluster admin runtime on cluster-authorized nodes
- organization/user portal runtime on tenant-safe nodes with scoped data
Even when deployed together, ownership remains separate:
- Web Ingress is entry/presentation
- Public/Admin HTTPS Ingress is entry/presentation
- Fabric Transport is QUIC-only service-channel delivery
- Control API is authorization/domain logic
- PostgreSQL is source of truth
- Fabric Storage/Config Storage is scoped distribution/cache
- node-agent consumes scoped desired state
## Required Roles
The platform recognizes these web/admin placement roles:
| Role | Scope | Purpose |
| --- | --- | --- |
| `public-ingress` | cluster or organization | Listen on 80/443, terminate/validate HTTPS, forward scoped service channels. |
| `admin-ingress` | platform or cluster | HTTPS edge for admin surfaces. It does not own authority. |
| `global-admin-runtime` | platform trusted nodes only | Platform-owner console/runtime. |
| `cluster-admin-runtime` | cluster | Cluster admin console/runtime for one cluster. |
| `organization-portal-runtime` | organization | Tenant-safe organization administration. |
| `user-portal-runtime` | user/organization | Personal account/resource portal. |
| `identity-runtime` | platform/cluster | Authentication, session, MFA, step-up and token issuance. |
| `policy-authority` | platform trusted nodes only | Authorization/policy decisions and signed claims. |
| `audit-sink` | platform trusted nodes only | Durable mutation/security audit ingestion. |
Legacy `entry-node` remains a generic client ingress/service edge role for
non-admin product services. It must not imply admin authority.
## Fabric Service Classes
Admin and portal traffic uses explicit fabric service classes. This prevents
admin traffic from being disguised as VPN/RDP/file/video traffic and gives the
routing layer clear QoS, role, and audit semantics.
| Service class | Required runtime roles | Projection |
| --- | --- | --- |
| `platform_admin` | `admin-ingress`, `global-admin-runtime`, `identity-runtime`, `policy-authority`, `audit-sink` | Platform-owner console. |
| `cluster_admin` | `admin-ingress`, `cluster-admin-runtime`, `identity-runtime`, `policy-authority`, `audit-sink` | One cluster. |
| `organization_portal` | `public-ingress`, `organization-portal-runtime`, `identity-runtime`, `policy-authority`, `audit-sink` | One organization. |
| `user_portal` | `public-ingress`, `user-portal-runtime`, `identity-runtime`, `policy-authority`, `audit-sink` | One authenticated user/account scope. |
Default channels for these classes are `control`, `interactive`, and
`reliable`. They are latency-sensitive control-plane/service traffic, not bulk
data transfer.
## Desired Workload Contract
Ingress nodes are configured through normal node desired workloads. The first
runtime stage is a contract probe: node-agent validates the policy and reports a
workload status, but it does not open `80`/`443` until the real ingress runtime
stage is enabled.
Example platform/cluster admin ingress workload:
```json
{
"service_type": "admin-ingress",
"desired_state": "enabled",
"runtime_mode": "native",
"config": {
"listen_http_port": 80,
"listen_https_port": 443,
"tls_mode": "terminate",
"scope": "platform",
"service_classes": ["platform_admin", "cluster_admin"]
}
}
```
Example organization/user public ingress workload:
```json
{
"service_type": "public-ingress",
"desired_state": "enabled",
"runtime_mode": "native",
"config": {
"listen_http_port": 80,
"listen_https_port": 443,
"tls_mode": "terminate",
"scope": "organization",
"service_classes": ["organization_portal", "user_portal"]
}
}
```
Contract-probe status requirements:
- `fabric_transport` is `quic_only`
- `http_between_fabric_nodes` is `false`
- `authority_service` is `false`
- `fabric_service_channel_required` is `true`
- `ports_opened_by_stub` is `false`
- invalid service classes or non-80/443 ports report `degraded`
- real listener startup requires both workload config
`real_listener_enabled=true` and node-agent process gate
`RAP_WEB_INGRESS_RUNTIME_ENABLED=true`
- without the process gate, a real-listener request reports
`web_ingress_real_listener_gate_disabled`
- the first handler stage returns schema
`rap.web_ingress.runtime_response.v1`; it redirects HTTP to HTTPS, exposes
health, validates service class/scope, and blocks payload forwarding with
`fabric_service_channel_binding_not_implemented` until the QUIC service
channel binding is implemented
- node-agent owns a web-ingress listener lifecycle manager. When the real
listener gate is enabled, it starts the HTTP redirect listener and starts
HTTPS only when `tls_cert_file` and `tls_key_file` are present in workload
config. Without TLS files the listener status is `partial` and service
payload remains blocked.
- HTTPS handler has a `FabricBinder` boundary. Valid requests become
`rap.web_ingress.fabric_request.v1` records with method, path, query, host,
derived scope, service class, safe headers, bounded body, and observed
timestamp. Runtime derives fabric scope from service class
(`platform_admin` -> `platform`, `cluster_admin` -> `cluster`,
`organization_portal` -> `organization`, `user_portal` -> `user`) before
signing/forwarding the request.
Dangerous browser headers such as `Authorization`, `Cookie`, `Set-Cookie`,
and service-channel tokens are not forwarded as ordinary proxy headers.
The binder must convert the request into a signed/scoped fabric service
channel envelope; if no binder is present, ingress returns
`fabric_service_channel_binding_not_implemented`.
- The first concrete binder emits
`rap.web_ingress.fabric_service_channel_envelope.v1`. The envelope contains
the safe request projection, base64-encoded body, scope, service class,
observed timestamp, and envelope timestamp. It is serialized as canonical JSON
for signing, then passed to an `EnvelopeSigner` and `EnvelopeSender`.
`EnvelopeSigner` owns node/service-channel signature policy. `EnvelopeSender`
owns delivery into the QUIC fabric service channel and route selection. This
keeps HTTP edge handling separated from mesh internals while making the
security boundary explicit and testable.
- The initial signer implementation is Ed25519 over the canonical envelope
bytes. The signer can derive `key_id` from the public key fingerprint or use
an explicitly configured key id. Production deployment must bind this key to
the node identity/service-channel authority policy before enabling real
browser traffic.
- The initial mesh sender adapter can submit the signed envelope through the
existing reliable fabric channel runtime using `control` traffic class and a
configured route set to an admin/portal runtime node or pool. At this stage it
returns a delivery-accepted response with route/channel metrics. Full
request/response admin API streaming remains a later runtime step and must
stay on the same QUIC fabric channel model.
- The fabric channel runtime now also has a request/response path for web
ingress: it opens a QUIC stream, sends the signed envelope as `FrameData`, and
waits for a `FrameData` response on the same stream and sequence. Route
failures or response timeouts use the same latency-aware reroute path as
reliable delivery. Runtime HTTP responses use
`rap.web_ingress.fabric_runtime_response.v1` with status code, safe headers,
and body/body_b64. If a runtime response is not in that schema, ingress
reports delivery-accepted metrics instead of treating arbitrary payload as an
HTTP response.
- QUIC fabric server reserves `WebIngressForwardQUICStreamID` for web ingress
request/response forwarding. The server invokes a web-ingress forward handler
with the signed envelope payload and returns a wrapper containing either
runtime payload or an error on the same stream/sequence.
- Admin/portal runtime nodes have a signed-envelope receiver contract. The
receiver verifies `rap.web_ingress.signed_fabric_service_channel_envelope.v1`,
Ed25519 signature, trusted key id, scope, service class, and timestamp skew
before calling the local runtime handler. The local handler returns
`rap.web_ingress.fabric_runtime_response.v1`; unsafe response headers are
filtered before the payload is returned to the ingress edge.
- Node-agent exposes explicit runtime key policy inputs while the final signed
config-snapshot distribution is being wired:
`RAP_WEB_INGRESS_SIGNING_PRIVATE_KEY`,
`RAP_WEB_INGRESS_SIGNING_KEY_ID`, and
`RAP_WEB_INGRESS_TRUSTED_KEYS_JSON`. Trusted keys JSON may be either
`{"key_id":"public_key_b64"}` or an array of
`{"key_id":"...","public_key":"..."}` objects. Without trusted keys the
web-ingress receiver handler is not installed. Runtime receiver placement can
be narrowed with `RAP_WEB_INGRESS_RUNTIME_SERVICE_CLASSES`, a comma-separated
allow-list of `platform_admin`, `cluster_admin`, `organization_portal`, and
`user_portal`; this is a temporary explicit node-local policy until signed
role snapshots drive receiver placement.
- Heartbeat metadata includes `web_ingress_runtime_receiver_report` when QUIC
fabric or web-ingress key policy is configured. The report exposes the
signed-envelope schema, QUIC stream id, trusted key count, receiver
service-class allow-list, handler installation state, status/reason
(`ready`, `degraded`, or `blocked`), and QUIC endpoint readiness so the
fabric panel can show whether a node can currently receive admin/portal
runtime traffic and why it cannot.
- QUIC listener/reverse-transport handler configuration is sensitive to the
web-ingress trusted key policy and runtime service-class allow-list. If either
policy changes, node-agent restarts or refreshes the QUIC fabric handler
binding so stale key trust or stale receiver placement is not kept in memory.
- The first local admin runtime dispatcher is intentionally read-only. It
handles `/healthz`, `/readyz`, and `*/ui-manifest` requests after signed
envelope verification. It returns `rap.web_ingress.admin_runtime_response.v1`
with a safe `rap.web_ingress.ui_manifest.v1` projection that lists sections
and read-only actions for the requested service class. It rejects invalid
`scope`/`service_class` pairs before using either the local fallback or the
Control API projection client. Mutations return
`control_api_mutation_binding_not_implemented`; unknown read projections
return `control_api_projection_binding_not_implemented` until the dispatcher
is wired to the real Control API authorization/projection layer.
- The dispatcher now has a `ControlAPIProjectionClient` boundary. When bound,
read-only GET/HEAD requests are sent to the Control API projection endpoint
and returned as `rap.web_ingress.control_api_projection_response.v1`.
Backend exposes the first read-only projection endpoint at
`/api/v1/clusters/{cluster_id}/nodes/{node_id}/admin-runtime/projection`.
It returns safe manifest/projection payloads, marks audit as required, and
rejects mutation methods and invalid `scope`/`service_class` combinations.
Requests must use schema
`rap.web_ingress.control_api_projection_request.v1`; agent accepts responses
only with schema `rap.web_ingress.control_api_projection_response.v1`.
This is the first Control API binding slice; it is not yet a full
authorization/session/audit implementation.
## Future Stages
Suggested staged work:
@@ -417,8 +732,9 @@ This document does not authorize:
## Result / Decision
WEB is an ingress and presentation layer, not a cluster configuration owner.
Cluster configuration belongs to the Control Plane and is persisted in
PostgreSQL. Dynamic admin pages are allowed only as safe, scoped,
Fabric remains QUIC-only internally; HTTP/HTTPS exists only at the external
client edge. Cluster configuration belongs to the Control Plane and is persisted
in PostgreSQL. Dynamic admin pages are allowed only as safe, scoped,
schema-driven projections over Control Plane APIs. They must not embed secrets,
internal topology, peer caches, route caches, or arbitrary executable code.