37 lines
1.4 KiB
Markdown
37 lines
1.4 KiB
Markdown
# RAP host-agent monitor
|
|
|
|
`rap-host-agent monitor-loop` is the local watchdog that runs near a node host.
|
|
It complements the update loop:
|
|
|
|
- starts watched Docker containers when they are stopped;
|
|
- restarts watched containers when Docker health is `unhealthy`;
|
|
- restarts containers stuck in `restarting` longer than the stale threshold;
|
|
- rate-limits repeated remediation with a restart cooldown;
|
|
- watches disk pressure and runs safe cleanup when the cleanup threshold is reached;
|
|
- removes old `/tmp/rap-*` and `/tmp/go-build*` build directories;
|
|
- writes an optional JSON status file;
|
|
- reports monitor status to the control plane through the node update-status channel.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
rap-host-agent monitor-loop \
|
|
--fabric-registry-records-json '<signed_registry_records_json>' \
|
|
--cluster-authority-public-key '<cluster_authority_public_key>' \
|
|
--cluster-id cfc0743d-d960-49fb-9de8-96e063d5e4aa \
|
|
--node-id 108a0d66-d65e-4dea-b9a8-135366bf7dba \
|
|
--current-version 0.2.261-vpnfarm \
|
|
--interval-seconds 60 \
|
|
--disk-warn-percent 80 \
|
|
--disk-cleanup-percent 85 \
|
|
--disk-critical-percent 95 \
|
|
--status-file /tmp/rap-web-admin/html/downloads/ops/host-monitor-status.json \
|
|
--watch-container rap_test_postgres \
|
|
--watch-container rap_test_redis \
|
|
--watch-container rap_test_backend
|
|
```
|
|
|
|
On the shared test Docker host the current public status file is:
|
|
|
|
`http://docker-test.cin.su:18080/downloads/ops/host-monitor-status.json`
|