docs(npu): update service maps and runbooks
This commit is contained in:
@@ -19,6 +19,7 @@ swarm/
|
||||
│ └── vm/ # VM provisioning role (local)
|
||||
├── openclaw/ # Live mirror of guest ~/.openclaw/
|
||||
├── docker-compose.yaml # LiteLLM + supporting services
|
||||
├── docs/ # Swarm/agentmon/n8n infrastructure docs + diagrams
|
||||
├── litellm-config.yaml # LiteLLM static config
|
||||
├── litellm-init-credentials.sh # Register API keys into LiteLLM DB
|
||||
├── litellm-init-models.sh # Register models into LiteLLM DB (idempotent)
|
||||
@@ -29,6 +30,15 @@ swarm/
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Current swarm/service architecture
|
||||
|
||||
For the current host-side AI/search/voice automation stack, n8n watchdogs, and agentmon monitoring layer, see:
|
||||
|
||||
- [`docs/swarm-infrastructure.md`](docs/swarm-infrastructure.md) — operational overview and quick checks
|
||||
- [`docs/swarm-infrastructure.html`](docs/swarm-infrastructure.html) — dark SVG architecture diagram
|
||||
- [`docs/diagram-maintenance.md`](docs/diagram-maintenance.md) — diagram upkeep conventions
|
||||
- OpenVINO NPU services and prototypes are documented in `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md` and the component READMEs under `openvino-*-npu*/`. Live baseline ports are RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; sidecar ports `:18818`, `:18819`, `:18820`, and optional doc/image triage `:18829` are approved prototypes only, not live Atlas/Hermes routing.
|
||||
|
||||
## VM: zap
|
||||
|
||||
| Property | Value |
|
||||
|
||||
@@ -0,0 +1,66 @@
|
||||
# Diagram maintenance
|
||||
|
||||
Keep infrastructure diagrams current as first-class documentation, not as one-off screenshots.
|
||||
|
||||
## Current diagrams
|
||||
|
||||
- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — full Atlas/Hermes + n8n + agentmon + local AI/search/voice topology.
|
||||
|
||||
## When to update an existing diagram
|
||||
|
||||
Update the relevant diagram in the same change set when you change any of these:
|
||||
|
||||
- service topology, ports, or container names
|
||||
- monitoring or alerting paths
|
||||
- n8n workflow architecture
|
||||
- Hermes/Atlas routing or gateway responsibilities
|
||||
- local AI/search/voice endpoints
|
||||
- OpenVINO NPU live/prototype status, ports, or safety gates (`:18810`, `:18816`, `:18817`, `:18818`, `:18819`, `:18820`, optional `:18829`)
|
||||
- Obsidian/RAG data flow
|
||||
- OpenClaw/VM operational mode
|
||||
- ownership/source-of-truth paths for a component
|
||||
|
||||
## When to create a new diagram
|
||||
|
||||
Create a new focused diagram when the existing overview would become too dense. Good candidates:
|
||||
|
||||
- n8n workflow family or alerting-only diagram
|
||||
- agentmon internals: collectors → NATS → processor → Postgres → query/UI
|
||||
- Obsidian/RAG automation pipeline
|
||||
- local AI routing: Hermes/LiteLLM/llama.cpp/Ollama/provider boundaries
|
||||
- OpenVINO NPU assistant sidecars, with live baseline and approved/not-live prototype lanes separated
|
||||
- messaging/channel routing: Telegram/Discord/email → Hermes/n8n/alerts
|
||||
- disaster recovery / backup topology
|
||||
|
||||
## Style rules
|
||||
|
||||
- Prefer standalone `.html` files with inline SVG so they render offline in any browser.
|
||||
- Keep the source file committed alongside the docs; do not rely on generated screenshots as the only artifact.
|
||||
- Link diagrams from the nearest README or operational doc.
|
||||
- Keep labels operational: service name, port, responsibility, and data direction.
|
||||
- Avoid secrets, credential names that imply secret values, private tokens, raw webhook URLs, or sensitive sample payloads.
|
||||
- Do not imply live Atlas/Hermes/RAG routing to an OpenVINO NPU prototype unless a reviewed implementation actually enabled it; label approved prototypes as `not live` or `approval required`.
|
||||
- If a raw export or live config was used to build the diagram, commit only the sanitized diagram/docs, not the raw sensitive source.
|
||||
|
||||
## Verification before committing
|
||||
|
||||
```bash
|
||||
# Check the files are valid text and do not contain obvious secret markers
|
||||
python - <<'PY'
|
||||
from pathlib import Path
|
||||
for p in Path('docs').glob('*.html'):
|
||||
text = p.read_text()
|
||||
hits = [s for s in ['api_key', 'token', 'password', 'Authorization', 'Bearer ', 'secret'] if s.lower() in text.lower()]
|
||||
print(p, hits)
|
||||
PY
|
||||
|
||||
# Inspect targeted diff only
|
||||
git diff --stat -- docs README.md
|
||||
```
|
||||
|
||||
After editing diagrams, commit with a docs-focused message, for example:
|
||||
|
||||
```bash
|
||||
git add docs/*.md docs/*.html README.md
|
||||
git commit -m "docs: update swarm infrastructure diagrams"
|
||||
```
|
||||
@@ -0,0 +1,115 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<title>Will's Swarm Infrastructure</title>
|
||||
<style>
|
||||
:root { color-scheme: dark; --bg:#020617; --panel:#0f172a; --text:#e2e8f0; --muted:#94a3b8; }
|
||||
body { margin:0; background:var(--bg); color:var(--text); font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", monospace; }
|
||||
.wrap { max-width: 1320px; margin: 0 auto; padding: 28px; }
|
||||
.header { display:flex; align-items:center; gap:12px; margin-bottom:18px; }
|
||||
.dot { width:12px; height:12px; border-radius:50%; background:#34d399; box-shadow:0 0 18px #34d399; animation:pulse 1.8s infinite; }
|
||||
@keyframes pulse { 0%,100%{opacity:.6; transform:scale(.9)} 50%{opacity:1; transform:scale(1.15)} }
|
||||
h1 { font-size: 24px; margin:0; letter-spacing:-.02em; }
|
||||
.sub { color:var(--muted); margin:4px 0 22px; font-size:13px; }
|
||||
.card { border:1px solid #1e293b; border-radius:16px; background:linear-gradient(180deg, rgba(15,23,42,.95), rgba(2,6,23,.9)); padding:18px; box-shadow:0 24px 80px rgba(0,0,0,.35); }
|
||||
svg { width:100%; height:auto; display:block; }
|
||||
.cards { display:grid; grid-template-columns: repeat(3, minmax(0,1fr)); gap:14px; margin-top:16px; }
|
||||
.info { border:1px solid #1e293b; border-radius:12px; background:#0f172a; padding:14px; }
|
||||
.info h3 { margin:0 0 8px; font-size:13px; }
|
||||
.info ul { margin:0; padding-left:0; list-style:none; color:#cbd5e1; font-size:12px; line-height:1.6; }
|
||||
.footer { color:#64748b; font-size:11px; margin-top:14px; }
|
||||
@media (max-width: 900px) { .cards { grid-template-columns: 1fr; } }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="wrap">
|
||||
<div class="header"><div class="dot"></div><div><h1>Will's Swarm Infrastructure</h1><div class="sub">Atlas/Hermes gateway + n8n automation + agentmon monitoring + local AI/search/voice services</div></div></div>
|
||||
<div class="card">
|
||||
<svg viewBox="0 0 1280 980" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Swarm infrastructure architecture diagram">
|
||||
<defs>
|
||||
<pattern id="grid" width="40" height="40" patternUnits="userSpaceOnUse"><path d="M 40 0 L 0 0 0 40" fill="none" stroke="#1e293b" stroke-width="0.5"/></pattern>
|
||||
<marker id="arrow" markerWidth="10" markerHeight="10" refX="8" refY="3" orient="auto" markerUnits="strokeWidth"><path d="M0,0 L0,6 L9,3 z" fill="#38bdf8" /></marker>
|
||||
<marker id="arrowGreen" markerWidth="10" markerHeight="10" refX="8" refY="3" orient="auto" markerUnits="strokeWidth"><path d="M0,0 L0,6 L9,3 z" fill="#34d399" /></marker>
|
||||
<marker id="arrowOrange" markerWidth="10" markerHeight="10" refX="8" refY="3" orient="auto" markerUnits="strokeWidth"><path d="M0,0 L0,6 L9,3 z" fill="#fb923c" /></marker>
|
||||
<marker id="arrowRose" markerWidth="10" markerHeight="10" refX="8" refY="3" orient="auto" markerUnits="strokeWidth"><path d="M0,0 L0,6 L9,3 z" fill="#fb7185" /></marker>
|
||||
<filter id="glow"><feGaussianBlur stdDeviation="2.5" result="coloredBlur"/><feMerge><feMergeNode in="coloredBlur"/><feMergeNode in="SourceGraphic"/></feMerge></filter>
|
||||
<style>
|
||||
.title{font:700 13px monospace; fill:#e2e8f0}.label{font:11px monospace; fill:#cbd5e1}.tiny{font:9px monospace; fill:#94a3b8}.port{font:8px monospace; fill:#64748b}
|
||||
.edge{fill:none; stroke:#38bdf8; stroke-width:1.8; marker-end:url(#arrow); opacity:.8}.edgeG{fill:none; stroke:#34d399; stroke-width:1.8; marker-end:url(#arrowGreen); opacity:.85}.edgeO{fill:none; stroke:#fb923c; stroke-width:1.8; marker-end:url(#arrowOrange); opacity:.85}.edgeR{fill:none; stroke:#fb7185; stroke-width:1.8; stroke-dasharray:5,4; marker-end:url(#arrowRose); opacity:.85}
|
||||
</style>
|
||||
</defs>
|
||||
<rect width="1280" height="980" fill="#020617"/><rect width="1280" height="980" fill="url(#grid)" opacity="0.7"/>
|
||||
|
||||
<!-- arrows behind nodes -->
|
||||
<path class="edge" d="M140 120 C210 120 210 205 280 205"/>
|
||||
<path class="edge" d="M140 190 C210 190 210 235 280 235"/>
|
||||
<path class="edge" d="M140 260 C210 260 210 265 280 265"/>
|
||||
<path class="edgeG" d="M470 240 C545 240 545 320 620 320"/>
|
||||
<path class="edgeG" d="M470 240 C545 240 545 455 620 455"/>
|
||||
<path class="edgeO" d="M820 320 C890 320 890 210 965 210"/>
|
||||
<path class="edgeO" d="M820 320 C890 320 890 315 965 315"/>
|
||||
<path class="edgeO" d="M820 320 C890 320 890 420 965 420"/>
|
||||
<path class="edgeR" d="M820 455 C890 455 890 595 965 595"/>
|
||||
<path class="edgeR" d="M820 455 C890 455 890 705 965 705"/>
|
||||
<path class="edgeG" d="M820 455 C890 455 890 790 965 790"/>
|
||||
<path class="edge" d="M815 635 C900 635 900 650 965 650"/>
|
||||
<path class="edge" d="M815 695 C900 695 900 735 965 735"/>
|
||||
<path class="edgeG" d="M625 635 C555 635 555 720 470 720"/>
|
||||
<path class="edge" d="M470 720 C545 720 545 565 620 565"/>
|
||||
<path class="edgeR" d="M490 735 C620 735 790 880 965 880"/>
|
||||
|
||||
<!-- boundaries -->
|
||||
<rect x="250" y="80" width="250" height="260" rx="14" fill="none" stroke="#fbbf24" stroke-width="1.4" stroke-dasharray="8,5" opacity=".75"/>
|
||||
<text x="265" y="103" class="tiny" fill="#fbbf24">Hermes gateway layer</text>
|
||||
<rect x="590" y="105" width="260" height="655" rx="14" fill="none" stroke="#fbbf24" stroke-width="1.4" stroke-dasharray="8,5" opacity=".75"/>
|
||||
<text x="605" y="128" class="tiny" fill="#fbbf24">n8n + agentmon observability</text>
|
||||
<rect x="935" y="95" width="280" height="850" rx="14" fill="none" stroke="#fbbf24" stroke-width="1.4" stroke-dasharray="8,5" opacity=".75"/>
|
||||
<text x="950" y="118" class="tiny" fill="#fbbf24">local swarm services</text>
|
||||
|
||||
<!-- external channels -->
|
||||
<g><rect x="30" y="90" width="110" height="58" rx="8" fill="#0f172a"/><rect x="30" y="90" width="110" height="58" rx="8" fill="rgba(30,41,59,.5)" stroke="#94a3b8" stroke-width="1.5"/><text x="50" y="116" class="title">Telegram</text><text x="52" y="134" class="tiny">DM/groups</text></g>
|
||||
<g><rect x="30" y="160" width="110" height="58" rx="8" fill="#0f172a"/><rect x="30" y="160" width="110" height="58" rx="8" fill="rgba(30,41,59,.5)" stroke="#94a3b8" stroke-width="1.5"/><text x="52" y="186" class="title">Discord</text><text x="48" y="204" class="tiny">#ops-alerts</text></g>
|
||||
<g><rect x="30" y="230" width="110" height="58" rx="8" fill="#0f172a"/><rect x="30" y="230" width="110" height="58" rx="8" fill="rgba(30,41,59,.5)" stroke="#94a3b8" stroke-width="1.5"/><text x="65" y="256" class="title">Email</text><text x="48" y="274" class="tiny">Gmail IMAP</text></g>
|
||||
|
||||
<!-- Hermes -->
|
||||
<g filter="url(#glow)"><rect x="280" y="180" width="190" height="100" rx="10" fill="#0f172a"/><rect x="280" y="180" width="190" height="100" rx="10" fill="rgba(8,51,68,.4)" stroke="#22d3ee" stroke-width="1.8"/><text x="325" y="213" class="title">Atlas / Hermes</text><text x="310" y="235" class="label">default profile gateway</text><text x="318" y="258" class="tiny">tools • memory • specialists</text></g>
|
||||
|
||||
<!-- n8n and agentmon -->
|
||||
<g><rect x="620" y="280" width="200" height="80" rx="10" fill="#0f172a"/><rect x="620" y="280" width="200" height="80" rx="10" fill="rgba(6,78,59,.4)" stroke="#34d399" stroke-width="1.8"/><text x="705" y="312" text-anchor="middle" class="title">n8n-agent</text><text x="705" y="333" text-anchor="middle" class="tiny">automation workflows</text><text x="705" y="350" text-anchor="middle" class="port">:18808 host / :5678 container</text></g>
|
||||
<g><rect x="620" y="415" width="200" height="85" rx="10" fill="#0f172a"/><rect x="620" y="415" width="200" height="85" rx="10" fill="rgba(6,78,59,.4)" stroke="#34d399" stroke-width="1.8"/><text x="720" y="445" text-anchor="middle" class="title">agentmon-query</text><text x="720" y="466" text-anchor="middle" class="tiny">aggregate snapshots/API</text><text x="720" y="484" text-anchor="middle" class="port">:8081 /v1/events</text></g>
|
||||
<g><rect x="620" y="530" width="200" height="210" rx="10" fill="#0f172a"/><rect x="620" y="530" width="200" height="210" rx="10" fill="rgba(251,146,60,.14)" stroke="#fb923c" stroke-width="1.8"/><text x="720" y="560" text-anchor="middle" class="title">agentmon pipeline</text><text x="720" y="590" text-anchor="middle" class="tiny">ingest :8080</text><text x="720" y="615" text-anchor="middle" class="tiny">NATS JetStream</text><text x="720" y="640" text-anchor="middle" class="tiny">event processor</text><text x="720" y="665" text-anchor="middle" class="tiny">Postgres DB</text><text x="720" y="690" text-anchor="middle" class="tiny">web UI :8082</text><text x="720" y="720" text-anchor="middle" class="port">swarm.snapshot + openclaw.snapshot</text></g>
|
||||
|
||||
<!-- Local services -->
|
||||
<g><rect x="965" y="165" width="210" height="80" rx="9" fill="#0f172a"/><rect x="965" y="165" width="210" height="80" rx="9" fill="rgba(6,78,59,.4)" stroke="#34d399" stroke-width="1.6"/><text x="1070" y="195" text-anchor="middle" class="title">LiteLLM</text><text x="1070" y="216" text-anchor="middle" class="tiny">LLM router + DB</text><text x="1070" y="234" text-anchor="middle" class="port">:18804</text></g>
|
||||
<g><rect x="965" y="275" width="210" height="80" rx="9" fill="#0f172a"/><rect x="965" y="275" width="210" height="80" rx="9" fill="rgba(8,51,68,.4)" stroke="#22d3ee" stroke-width="1.6"/><text x="1070" y="305" text-anchor="middle" class="title">Search</text><text x="1070" y="326" text-anchor="middle" class="tiny">SearXNG + Brave MCP</text><text x="1070" y="344" text-anchor="middle" class="port">:18803 / :18802</text></g>
|
||||
<g><rect x="965" y="385" width="210" height="80" rx="9" fill="#0f172a"/><rect x="965" y="385" width="210" height="80" rx="9" fill="rgba(8,51,68,.4)" stroke="#22d3ee" stroke-width="1.6"/><text x="1070" y="415" text-anchor="middle" class="title">Voice</text><text x="1070" y="436" text-anchor="middle" class="tiny">Kokoro + Whisper</text><text x="1070" y="454" text-anchor="middle" class="port">:18805 / :18816</text></g>
|
||||
<g><rect x="965" y="555" width="210" height="80" rx="9" fill="#0f172a"/><rect x="965" y="555" width="210" height="80" rx="9" fill="rgba(76,29,149,.4)" stroke="#a78bfa" stroke-width="1.6"/><text x="1070" y="585" text-anchor="middle" class="title">Docker services</text><text x="1070" y="606" text-anchor="middle" class="tiny">agentmon.monitor=true</text><text x="1070" y="624" text-anchor="middle" class="port">swarm/service snapshots</text></g>
|
||||
<g><rect x="965" y="665" width="210" height="80" rx="9" fill="#0f172a"/><rect x="965" y="665" width="210" height="80" rx="9" fill="rgba(120,53,15,.3)" stroke="#fbbf24" stroke-width="1.6"/><text x="1070" y="695" text-anchor="middle" class="title">OpenClaw VMs</text><text x="1070" y="716" text-anchor="middle" class="tiny">currently dormant</text><text x="1070" y="734" text-anchor="middle" class="port">openclaw.snapshot</text></g>
|
||||
<g><rect x="965" y="775" width="210" height="75" rx="9" fill="#0f172a"/><rect x="965" y="775" width="210" height="75" rx="9" fill="rgba(76,29,149,.4)" stroke="#a78bfa" stroke-width="1.6"/><text x="1070" y="802" text-anchor="middle" class="title">Obsidian / RAG</text><text x="1070" y="821" text-anchor="middle" class="tiny">RAG endpoint :18810</text><text x="1070" y="840" text-anchor="middle" class="port">Chroma obsidian_bge_npu</text></g>
|
||||
<g><rect x="965" y="870" width="210" height="80" rx="9" fill="#0f172a"/><rect x="965" y="870" width="210" height="80" rx="9" fill="rgba(244,63,94,.16)" stroke="#fb7185" stroke-width="1.6" stroke-dasharray="6,4"/><text x="1070" y="896" text-anchor="middle" class="title">NPU sidecars</text><text x="1070" y="917" text-anchor="middle" class="tiny">approved prototypes; not live</text><text x="1070" y="936" text-anchor="middle" class="port">:18818/:18819/:18820/:18829</text></g>
|
||||
|
||||
<!-- host local ai box -->
|
||||
<g><rect x="280" y="675" width="210" height="145" rx="10" fill="#0f172a"/><rect x="280" y="675" width="210" height="145" rx="10" fill="rgba(76,29,149,.4)" stroke="#a78bfa" stroke-width="1.8"/><text x="385" y="706" text-anchor="middle" class="title">host local AI</text><text x="385" y="730" text-anchor="middle" class="tiny">llama.cpp :18806</text><text x="385" y="752" text-anchor="middle" class="tiny">Ollama fallback :18807</text><text x="385" y="774" text-anchor="middle" class="tiny">OpenVINO embed :18817 live</text><text x="385" y="797" text-anchor="middle" class="tiny">Whisper NPU :18816 live</text></g>
|
||||
|
||||
<!-- legend -->
|
||||
<g transform="translate(40,910)">
|
||||
<text class="tiny" fill="#94a3b8">Legend</text>
|
||||
<rect x="0" y="16" width="14" height="10" fill="rgba(8,51,68,.4)" stroke="#22d3ee"/><text x="22" y="25" class="tiny">Gateway/Search/Voice</text>
|
||||
<rect x="180" y="16" width="14" height="10" fill="rgba(6,78,59,.4)" stroke="#34d399"/><text x="202" y="25" class="tiny">Automation/API</text>
|
||||
<rect x="320" y="16" width="14" height="10" fill="rgba(76,29,149,.4)" stroke="#a78bfa"/><text x="342" y="25" class="tiny">Data/AI stores</text>
|
||||
<rect x="475" y="16" width="14" height="10" fill="rgba(251,146,60,.14)" stroke="#fb923c"/><text x="497" y="25" class="tiny">Event bus/pipeline</text>
|
||||
<line x1="650" y1="22" x2="700" y2="22" class="edgeR"/><text x="710" y="25" class="tiny">Monitoring / not-live prototype flows</text>
|
||||
</g>
|
||||
</svg>
|
||||
</div>
|
||||
<div class="cards">
|
||||
<div class="info"><h3>Monitoring model</h3><ul><li>• n8n direct probes critical ports</li><li>• agentmon aggregates Docker/OpenClaw snapshots</li><li>• n8n polls agentmon for stale/degraded state</li></ul></div>
|
||||
<div class="info"><h3>Operational endpoints</h3><ul><li>• n8n: 127.0.0.1:18808</li><li>• agentmon query/UI: 8081 / 8082</li><li>• live NPU: RAG 18810, Whisper 18816, embeddings 18817</li><li>• prototypes not live-routed: 18818/18819/18820/18829</li></ul></div>
|
||||
<div class="info"><h3>Source paths</h3><ul><li>• Swarm repo: ~/lab/swarm</li><li>• Agentmon repo: ~/lab/agentmon</li><li>• Workflows: swarm-common/n8n-workflows</li></ul></div>
|
||||
</div>
|
||||
<div class="footer">Generated as repo documentation. Open locally in a browser; no JavaScript, all SVG inline. Dashed red OpenVINO NPU sidecars are approved prototypes only and do not imply live Atlas/Hermes/RAG routing.</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,250 @@
|
||||
# Swarm Infrastructure
|
||||
|
||||
This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the `zap` workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer.
|
||||
|
||||
## High-level topology
|
||||
|
||||
```text
|
||||
Telegram / Discord / Email
|
||||
|
|
||||
v
|
||||
Hermes / Atlas gateway (default profile)
|
||||
|
|
||||
+--> local tools and specialist profiles
|
||||
+--> n8n automation workflows on :18808
|
||||
|
||||
n8n automation
|
||||
|
|
||||
+--> direct watchdog probes for key service ports
|
||||
+--> Agentmon Health Watchdog -> agentmon-query :8081
|
||||
+--> Obsidian, RAG, voice memo, URL capture, digest workflows
|
||||
|
||||
agentmon
|
||||
|
|
||||
+--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true
|
||||
+--> agentmon-openclaw-monitor -> OpenClaw VM snapshots
|
||||
+--> NATS JetStream -> event processor -> Postgres
|
||||
+--> query API / UI on :8081 / :8082
|
||||
|
||||
local AI/search/voice services
|
||||
|
|
||||
+--> LiteLLM :18804
|
||||
+--> SearXNG :18803
|
||||
+--> Brave MCP :18802
|
||||
+--> llama.cpp :18806
|
||||
+--> Ollama embeddings :18807 (legacy/CPU fallback)
|
||||
+--> OpenVINO NPU embeddings :18817
|
||||
+--> Kokoro TTS :18805
|
||||
+--> Whisper NPU :18816
|
||||
+--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829
|
||||
```
|
||||
|
||||
See also:
|
||||
|
||||
- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — visual architecture diagram
|
||||
- [`diagram-maintenance.md`](./diagram-maintenance.md) — how to keep diagrams updated and when to create new ones
|
||||
|
||||
## Runtime layers
|
||||
|
||||
### 1. Messaging and agent gateway
|
||||
|
||||
- **Hermes / Atlas default profile** is the production messaging gateway.
|
||||
- Connected platforms include Telegram, Discord, and email.
|
||||
- Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon.
|
||||
- Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway.
|
||||
|
||||
### 2. n8n automation
|
||||
|
||||
Container/service:
|
||||
|
||||
- `n8n-agent`
|
||||
- Host URL: `http://127.0.0.1:18808`
|
||||
- Container URL: `http://127.0.0.1:5678`
|
||||
- Compose project: `/home/will/lab/swarm/docker-compose.yaml`
|
||||
|
||||
Important workflow source exports live under:
|
||||
|
||||
- `swarm-common/n8n-workflows/`
|
||||
|
||||
Current health/automation patterns:
|
||||
|
||||
- **Swarm Health Watchdog**: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc.
|
||||
- **Agentmon Health Watchdog**: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state.
|
||||
- **RAG and Embedding Health Watchdog**: checks RAG/search/embedding path.
|
||||
- Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction.
|
||||
|
||||
### 3. Agentmon monitoring layer
|
||||
|
||||
Repo:
|
||||
|
||||
- `/home/will/lab/agentmon`
|
||||
|
||||
Compose services:
|
||||
|
||||
- `agentmon-ingest` on `:8080` — ingestion gateway, `/healthz`
|
||||
- `agentmon-query` on `:8081` — query API, `/healthz`, `/v1/events`, `/v1/stats/summary`
|
||||
- `agentmon-ui` on `:8082` — web UI, `/healthz`
|
||||
- `agentmon-processor` — NATS to Postgres event processor
|
||||
- `agentmon-swarm-monitor` — monitors Docker containers labeled `agentmon.monitor=true`
|
||||
- `agentmon-openclaw-monitor` — emits OpenClaw VM snapshots
|
||||
- `agentmon-db` — Postgres
|
||||
- `agentmon-nats` — NATS JetStream
|
||||
|
||||
Key query endpoints:
|
||||
|
||||
```text
|
||||
http://127.0.0.1:8080/healthz
|
||||
http://127.0.0.1:8081/healthz
|
||||
http://127.0.0.1:8082/healthz
|
||||
http://127.0.0.1:8081/v1/stats/summary
|
||||
http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
|
||||
http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20
|
||||
http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3
|
||||
```
|
||||
|
||||
From inside `n8n-agent`, use the Docker bridge gateway:
|
||||
|
||||
```text
|
||||
http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
|
||||
```
|
||||
|
||||
### 4. Local AI, search, and voice services
|
||||
|
||||
Docker services:
|
||||
|
||||
- `litellm` — `:18804`, OpenAI-compatible LLM router
|
||||
- `litellm-db` — Postgres backing LiteLLM
|
||||
- `searxng` — `:18803`, local metasearch
|
||||
- `brave-search` — `:18802`, Brave Search MCP server
|
||||
- `kokoro-tts` — `:18805`, local TTS
|
||||
- `whisper-server-npu` — `:18816`, OpenVINO NPU local transcription
|
||||
- `n8n-agent` — `:18808`, automation
|
||||
|
||||
Host/user services:
|
||||
|
||||
- `llama-server.service` — `:18806`, local llama.cpp OpenAI-compatible LLM
|
||||
- `ollama.service` — `:18807`, legacy/CPU embeddings API fallback
|
||||
- `openvino-embeddings.service` — `:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`)
|
||||
- `docker-health-endpoint.service` — `:18809`, read-only container health for n8n
|
||||
- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings
|
||||
- `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction
|
||||
- `voice-memo-processor.service` — `:18813`, voice memo processing
|
||||
- `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper
|
||||
|
||||
Approved but not live-routed OpenVINO NPU sidecars:
|
||||
|
||||
| Port | Component | State | Safety boundary |
|
||||
| ---: | --- | --- | --- |
|
||||
| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves |
|
||||
| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
|
||||
| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing |
|
||||
| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` |
|
||||
|
||||
These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
|
||||
|
||||
### 5. Obsidian and RAG
|
||||
|
||||
Vault:
|
||||
|
||||
- `/home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap`
|
||||
|
||||
Local REST API:
|
||||
|
||||
- HTTP: `127.0.0.1:27123`
|
||||
- HTTPS: `127.0.0.1:27124`
|
||||
|
||||
RAG/vector store:
|
||||
|
||||
- ChromaDB path: `~/.hermes/data/rag-search/chroma/`
|
||||
- Reindex state/progress: active BGE/NPU state in `~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json` and `obsidian_bge_npu_reindex_progress.json`; legacy Ollama state in `obsidian_index_state.json` remains for comparison/fallback.
|
||||
- Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on `:18817`, currently `bge-base-en-v1.5-int8-ov`, collection `obsidian_bge_npu`.
|
||||
- Legacy comparison/fallback collection: `obsidian`, built with Ollama on `:18807` using `nomic-embed-text`.
|
||||
- Reindex endpoint: `POST :18810/reindex` for incremental updates, `POST :18810/reindex?full=true` for full semantic rebuilds, `GET :18810/semantic-health` to verify vectors plus a search smoke test.
|
||||
|
||||
## Monitoring model
|
||||
|
||||
The monitoring design is intentionally layered:
|
||||
|
||||
1. **n8n direct probes** check critical service endpoints and send deduped alerts.
|
||||
2. **agentmon** continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres.
|
||||
3. **n8n Agentmon Health Watchdog** polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded.
|
||||
4. **Hermes/Atlas** can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks.
|
||||
|
||||
This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning.
|
||||
|
||||
## Agentmon Health Watchdog
|
||||
|
||||
Workflow source:
|
||||
|
||||
- `swarm-common/n8n-workflows/agentmon-health-watchdog.json`
|
||||
|
||||
Installed n8n workflow:
|
||||
|
||||
- Name: `Agentmon Health Watchdog`
|
||||
- ID: `AgentmonHealthWatchdog`
|
||||
- Schedule: every 5 minutes
|
||||
|
||||
Alert conditions:
|
||||
|
||||
- `agentmon-ingest`, `agentmon-query`, or `agentmon-ui` `/healthz` fails.
|
||||
- Latest `swarm.snapshot` is missing.
|
||||
- Latest `swarm.snapshot` is older than 3 minutes.
|
||||
- Snapshot issues are non-empty.
|
||||
- Required agentmon services are missing or not healthy/running:
|
||||
- `agentmon-ingest`
|
||||
- `agentmon-query`
|
||||
- `agentmon-ui`
|
||||
- `agentmon-processor`
|
||||
- `agentmon-swarm-monitor`
|
||||
- `agentmon-db`
|
||||
- `agentmon-nats`
|
||||
|
||||
Deduplication:
|
||||
|
||||
- Alert after 2 failed checks.
|
||||
- Reminder every 6 failed runs.
|
||||
- Recovery message when state returns healthy.
|
||||
|
||||
## Operational quick checks
|
||||
|
||||
From the host:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm
|
||||
make status
|
||||
make local-ai-health
|
||||
./scripts/npu-service-health.sh # read-only; includes sysfs busy-time proof for :18817
|
||||
curl -fsS http://127.0.0.1:18808/healthz
|
||||
curl -fsS http://127.0.0.1:8081/healthz
|
||||
curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq .
|
||||
```
|
||||
|
||||
From inside `n8n-agent`:
|
||||
|
||||
```bash
|
||||
docker exec n8n-agent /bin/sh -lc '
|
||||
wget -qO- -T 5 http://172.19.0.1:8081/healthz
|
||||
wget -qO- -T 5 "http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1" | head -c 500
|
||||
'
|
||||
```
|
||||
|
||||
Verify n8n workflow activation:
|
||||
|
||||
```bash
|
||||
docker exec -u node n8n-agent n8n export:workflow \
|
||||
--id=AgentmonHealthWatchdog \
|
||||
--output=/tmp/agentmon-export.json
|
||||
|
||||
docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json
|
||||
jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json
|
||||
```
|
||||
|
||||
## Notes and pitfalls
|
||||
|
||||
- Do not commit `.env`, decrypted credentials, raw credential exports, or runtime DB files.
|
||||
- n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized.
|
||||
- From host, use `127.0.0.1:<host-port>`.
|
||||
- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:<host-port>` for host-published swarm services.
|
||||
- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
|
||||
- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
|
||||
- OpenVINO NPU sidecars on `:18818`, `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. Do not draw live Atlas/Hermes/RAG arrows to them in diagrams until that approval and implementation actually exist.
|
||||
Executable
+115
@@ -0,0 +1,115 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Read-only health probe for Will's local OpenVINO/NPU services.
|
||||
# This script intentionally does not start, stop, restart, enable, reindex, or route anything.
|
||||
|
||||
BUSY_PATH=${BUSY_PATH:-/sys/class/accel/accel0/device/npu_busy_time_us}
|
||||
CURL_TIMEOUT=${CURL_TIMEOUT:-8}
|
||||
EMBED_MODEL=${EMBED_MODEL:-bge-base-en-v1.5-int8-ov}
|
||||
EMBED_URL=${EMBED_URL:-http://127.0.0.1:18817/v1/embeddings}
|
||||
|
||||
have() { command -v "$1" >/dev/null 2>&1; }
|
||||
|
||||
json_pretty() {
|
||||
if have jq; then
|
||||
jq .
|
||||
else
|
||||
python -m json.tool
|
||||
fi
|
||||
}
|
||||
|
||||
section() {
|
||||
printf '\n== %s ==\n' "$1"
|
||||
}
|
||||
|
||||
http_json() {
|
||||
local name=$1 url=$2
|
||||
printf '\n[%s] %s\n' "$name" "$url"
|
||||
if ! curl -fsS --max-time "$CURL_TIMEOUT" "$url" | json_pretty; then
|
||||
printf 'status=unavailable_or_non_json\n'
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
busy_value() {
|
||||
if [[ -r "$BUSY_PATH" ]]; then
|
||||
tr -d '\n' < "$BUSY_PATH"
|
||||
else
|
||||
printf 'missing'
|
||||
fi
|
||||
}
|
||||
|
||||
section "NPU counter"
|
||||
printf 'busy_path=%s\n' "$BUSY_PATH"
|
||||
printf 'busy_time_us=%s\n' "$(busy_value)"
|
||||
|
||||
section "Listeners"
|
||||
# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
|
||||
# approved prototypes 18818/18819/18820, and optional doc/image triage 18829.
|
||||
# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
|
||||
# alternate used to avoid collisions during prior smoke tests.
|
||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
||||
|
||||
section "User service states"
|
||||
for unit in \
|
||||
openvino-embeddings.service \
|
||||
rag-embedding-health.service \
|
||||
openvino-reranker.service \
|
||||
openvino-router-classifier.service \
|
||||
openvino-genai-npu-worker.service; do
|
||||
active=$(systemctl --user is-active "$unit" 2>/dev/null || true)
|
||||
enabled=$(systemctl --user is-enabled "$unit" 2>/dev/null || true)
|
||||
printf '%-38s active=%-10s enabled=%s\n' "$unit" "${active:-unknown}" "${enabled:-unknown}"
|
||||
done
|
||||
|
||||
section "Docker service states"
|
||||
if [[ -d /home/will/lab/swarm ]]; then
|
||||
(cd /home/will/lab/swarm && docker compose ps whisper-server-npu 2>/dev/null) || true
|
||||
fi
|
||||
|
||||
section "HTTP health"
|
||||
http_json "RAG endpoint" "http://127.0.0.1:18810/healthz" || true
|
||||
http_json "RAG/embedding health wrapper" "http://127.0.0.1:18814/healthz" || true
|
||||
http_json "Whisper NPU" "http://127.0.0.1:18816/health" || true
|
||||
http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true
|
||||
# Prototypes are expected to be unavailable until explicitly started/approved.
|
||||
http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true
|
||||
http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true
|
||||
http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true
|
||||
http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true
|
||||
|
||||
section "Embeddings NPU busy-time proof"
|
||||
if [[ ! -r "$BUSY_PATH" ]]; then
|
||||
printf 'result=failed reason=missing_busy_counter\n'
|
||||
exit 2
|
||||
fi
|
||||
before=$(busy_value)
|
||||
response=$(curl -fsS --max-time "$CURL_TIMEOUT" \
|
||||
"$EMBED_URL" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d "{\"input\":\"non-private npu health probe\",\"model\":\"$EMBED_MODEL\"}" || true)
|
||||
after=$(busy_value)
|
||||
if [[ -z "$response" ]]; then
|
||||
printf 'result=failed reason=embedding_request_failed before_us=%s after_us=%s\n' "$before" "$after"
|
||||
exit 3
|
||||
fi
|
||||
delta=$((after - before))
|
||||
printf 'sysfs_before_us=%s\nsysfs_after_us=%s\nsysfs_delta_us=%s\n' "$before" "$after" "$delta"
|
||||
RESPONSE_JSON="$response" python - <<'PY' || true
|
||||
import json, os
|
||||
try:
|
||||
data = json.loads(os.environ.get('RESPONSE_JSON', ''))
|
||||
except Exception as exc:
|
||||
print(f'response_parse_error={type(exc).__name__}: {exc}')
|
||||
raise SystemExit(0)
|
||||
print(f"response_object={data.get('object')}")
|
||||
print(f"response_model={data.get('model')}")
|
||||
print(f"response_npu_busy_delta_us={data.get('npu_busy_delta_us')}")
|
||||
print(f"embedding_count={len(data.get('data', []))}")
|
||||
PY
|
||||
if (( delta <= 0 )); then
|
||||
printf 'result=failed reason=no_positive_sysfs_npu_delta\n'
|
||||
exit 4
|
||||
fi
|
||||
printf 'result=ok\n'
|
||||
@@ -0,0 +1,309 @@
|
||||
---
|
||||
type: service-catalog
|
||||
created: 2026-05-14T14:50:46-07:00
|
||||
updated: 2026-06-04T11:35:00-07:00
|
||||
tags:
|
||||
- service-catalog
|
||||
- swarm
|
||||
- hermes
|
||||
- automation
|
||||
---
|
||||
|
||||
# Service Catalog
|
||||
|
||||
Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs.
|
||||
|
||||
> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`; Obsidian/RAG embedding path refreshed on `2026-06-03T21:31:01-07:00`. Secrets are intentionally omitted.
|
||||
|
||||
## Quick links
|
||||
|
||||
- [[Ops Home]]
|
||||
- [[Obsidian Automation Health]]
|
||||
- [[Obsidian Plugin Setup]]
|
||||
- [[Runbooks Home]]
|
||||
- [[Projects Home]]
|
||||
- [[Decisions Home]]
|
||||
|
||||
## Primary repositories and config locations
|
||||
|
||||
| Area | Path / command | Purpose |
|
||||
| --- | --- | --- |
|
||||
| Swarm repo | `~/lab/swarm` | Docker services, n8n, local AI, OpenClaw helpers, service scripts |
|
||||
| Swarm Makefile | `cd ~/lab/swarm && make help` | Authoritative operations target list |
|
||||
| n8n workflow exports | `~/lab/swarm/swarm-common/n8n-workflows/` | Versioned workflow backups |
|
||||
| Shared Obsidian vault | `~/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap` | Active API-backed vault |
|
||||
| Hermes config | `~/.hermes/config.yaml` | Atlas/Hermes model, tools, gateway, profiles |
|
||||
| Hermes env/secrets | `~/.hermes/.env` | Secrets; do not print or commit |
|
||||
| Hermes source | `~/.hermes/hermes-agent` | Atlas local source checkout |
|
||||
| Hermes skills | `~/.hermes/skills/` | Procedural docs and reusable playbooks |
|
||||
|
||||
## Local endpoints
|
||||
|
||||
| Service | Port | Status | Purpose | Health / base URL |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Brave Search MCP | 18802 | HTTP 406 on plain GET `/mcp` | Brave Search MCP server for Hermes MCP tools | `http://127.0.0.1:18802/mcp` |
|
||||
| SearXNG | 18803 | OK 200 | SearXNG metasearch | `http://127.0.0.1:18803/search?q=test&format=json` |
|
||||
| LiteLLM | 18804 | no listener / HTTP 000 on 2026-05-27 | LiteLLM OpenAI-compatible model proxy | `http://127.0.0.1:18804/health/liveliness` |
|
||||
| Kokoro TTS | 18805 | OK 200 | Kokoro local TTS | `http://127.0.0.1:18805/health` |
|
||||
| llama.cpp | 18806 | OK 200 | llama.cpp local LLM | `http://127.0.0.1:18806/v1/models` |
|
||||
| Ollama embeddings | 18807 | OK 200 | Ollama embeddings API | `http://127.0.0.1:18807/api/version` |
|
||||
| n8n | 18808 | OK 200 | n8n workflow automation | `http://127.0.0.1:18808/healthz` |
|
||||
| Docker health | 18809 | OK 200 | Docker/container health API | `http://127.0.0.1:18809/health` |
|
||||
| Obsidian reindex | 18810 | OK 200 | Obsidian/RAG reindex trigger | `http://127.0.0.1:18810/healthz` |
|
||||
| Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` |
|
||||
| URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` |
|
||||
| Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` |
|
||||
| RAG/embedding health | 18814 | OK 200 | RAG/OpenVINO/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` |
|
||||
| Whisper OpenVINO NPU | 18816 | OK 200 / Docker healthy on 2026-06-04 | Intel NPU Whisper transcription service | `http://127.0.0.1:18816/health` |
|
||||
| OpenVINO embeddings | 18817 | OK 200 | Intel NPU embeddings service for live Obsidian RAG | `http://127.0.0.1:18817/health` |
|
||||
| OpenVINO NPU reranker prototype | 18818 | approved prototype; not enabled live | Optional second-stage RAG reranker | `http://127.0.0.1:18818/readyz` |
|
||||
| OpenVINO router/classifier prototype | 18819 | approved prototype; not enabled live | Dry-run Atlas/Hermes message classifier/router | `http://127.0.0.1:18819/healthz` |
|
||||
| OpenVINO GenAI NPU worker prototype | 18820 | approved prototype; not enabled live | Bounded local background generation worker | `http://127.0.0.1:18820/healthz` |
|
||||
| OpenVINO document/image triage prototype | 18828/18829 | approved foreground prototype; not enabled live | Local document/image triage with NPU embeddings stage via `:18817` | `http://127.0.0.1:<port>/healthz` |
|
||||
| Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` |
|
||||
|
||||
## Docker services
|
||||
|
||||
| Container | Status | Ports |
|
||||
| --- | --- | --- |
|
||||
| n8n-agent | Up 21 hours (healthy) | 0.0.0.0:18808->5678/tcp, [::]:18808->5678/tcp |
|
||||
| whisper-server-gpu | Up 27 hours (healthy) | 0.0.0.0:18801->8080/tcp, [::]:18801->8080/tcp |
|
||||
| whisper-server | Up 27 hours (healthy) | 0.0.0.0:18811->8080/tcp, [::]:18811->8080/tcp |
|
||||
| kokoro-tts | Up 25 hours | 0.0.0.0:18805->8880/tcp, [::]:18805->8880/tcp |
|
||||
| brave-search | Up 25 hours | 0.0.0.0:18802->8000/tcp, [::]:18802->8000/tcp |
|
||||
| searxng | Up 25 hours | 0.0.0.0:18803->8080/tcp, [::]:18803->8080/tcp |
|
||||
|
||||
Management commands:
|
||||
|
||||
```bash
|
||||
cd ~/lab/swarm
|
||||
make ps
|
||||
make status
|
||||
make local-ai-health
|
||||
make api-health
|
||||
make timers
|
||||
./scripts/npu-service-health.sh
|
||||
```
|
||||
|
||||
## Host-side systemd/user services
|
||||
|
||||
Important known services:
|
||||
|
||||
| Unit | Purpose |
|
||||
| --- | --- |
|
||||
| `llama-server.service` | Host-side llama.cpp local LLM on 18806 |
|
||||
| `ollama.service` | Host-side Ollama embeddings on 18807 |
|
||||
| `docker-health-endpoint.service` | Container health API on 18809 |
|
||||
| `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 |
|
||||
| `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 |
|
||||
| `voice-memo-processor.service` | Voice memo processing on 18813 |
|
||||
| `rag-embedding-health.service` | RAG/OpenVINO/Obsidian health check wrapper on 18814 |
|
||||
| `openvino-embeddings.service` | Intel NPU BGE embedding service on 18817 |
|
||||
| `openvino-reranker.service` | Optional NPU reranker prototype on 18818; not installed/enabled without approval |
|
||||
| `openvino-router-classifier.service` | Optional dry-run router/classifier prototype on 18819; not installed/enabled without approval |
|
||||
| `openvino-genai-npu-worker.service` | Optional bounded GenAI worker prototype on 18820; not installed/enabled without approval |
|
||||
|
||||
Useful checks:
|
||||
|
||||
```bash
|
||||
systemctl --user list-units '*obsidian*' '*rag*' '*url-content*' '*voice-memo*' '*docker-health*' --all
|
||||
systemctl --user list-timers
|
||||
journalctl --user -u obsidian-reindex-endpoint.service -n 50 --no-pager
|
||||
```
|
||||
|
||||
## n8n workflows
|
||||
|
||||
n8n UI/API: `http://127.0.0.1:18808`
|
||||
|
||||
| Workflow | ID | State |
|
||||
| --- | --- | --- |
|
||||
| Calendar to Obsidian Notes | QRCCdHNXZUHc2Oz4 | inactive |
|
||||
| Daily OpenClaw Session Digest | qqYwAD05AvRHrHPc | inactive |
|
||||
| Evening Digest | PlZywwqL8MRNEAN6 | active |
|
||||
| Gmail Inbox Monitor + Obsidian Notes | whtdorf7yJMVYeHm | active |
|
||||
| IMAP Inbox Triage + Obsidian Notes | 9sFwRyUDz51csAp7 | active |
|
||||
| IMAP Inbox Triage + Obsidian Notes (squareffect) | xjUoQf97TkBrawc8 | inactive |
|
||||
| IMAP Inbox Triage + Obsidian Notes (wills-portal) | kHDK9QdUSiAJ8rCM | inactive |
|
||||
| Morning Brief | g3IdGZCK1EtTsv9T | active |
|
||||
| n8n Failure Digest | G9ylNbHbnJ6fWX2C | active |
|
||||
| Nightly Obsidian Vault Sync | 75JCevkdgkyCr2qH | inactive |
|
||||
| Obsidian Chat Summary Capture | LF3i86l3NkxpayxL | active |
|
||||
| Obsidian Daily Review | YZyJ5G0Ur8D6TlM8 | active |
|
||||
| Obsidian Health + Reindex | PCtD3PuQjzKLyEEE | active |
|
||||
| Obsidian Inbox Triage | 6SKSZWZwuJNwuO2P | active |
|
||||
| Obsidian URL to Note | Ori3Bu5u5ODtxxyD | active |
|
||||
| Obsidian Vault Reindex | 85ntyyphDJ4Ms2b4 | active |
|
||||
| Obsidian Weekly Decision Runbook Extractor | UWLMOQQVxbTX6Sis | active |
|
||||
| OpenClaw Action Bus | Jwi54VWMdlLqYnRo | inactive |
|
||||
| OpenClaw Reminder Webhook | RUR1CGn0ikkxbPin | inactive |
|
||||
| RAG and Embedding Health Watchdog | SwKaPtYqUJrakpFu | active |
|
||||
| Swarm Health Watchdog | lDKocSFXBQWQrDd3 | active |
|
||||
| Voice Memo Capture (Audio URL + Local Whisper) | El1BHJZ56JlzhrRZ | active |
|
||||
| Web-to-Notes Capture (Local LLM + Obsidian) | GSmzuA5dgGgyRg5v | active |
|
||||
|
||||
Obsidian webhook endpoints:
|
||||
|
||||
| Workflow | Method / URL | Input |
|
||||
| --- | --- | --- |
|
||||
| Obsidian Chat Summary Capture | `POST http://127.0.0.1:18808/webhook/obsidian-chat-summary` | JSON with `type`, `title`, `summary`, `content`, optional `tags`, `metadata` |
|
||||
| Obsidian URL to Note | `POST http://127.0.0.1:18808/webhook/obsidian-url-to-note` | JSON with `url`, optional `folder`, `tags`, `notes` |
|
||||
|
||||
## Hermes capabilities
|
||||
|
||||
### Enabled toolsets
|
||||
|
||||
| Toolset | Description |
|
||||
| --- | --- |
|
||||
| web | 🔍 Web Search & Scraping |
|
||||
| browser | 🌐 Browser Automation |
|
||||
| terminal | 💻 Terminal & Processes |
|
||||
| file | 📁 File Operations |
|
||||
| code_execution | ⚡ Code Execution |
|
||||
| vision | 👁️ Vision / Image Analysis |
|
||||
| image_gen | 🎨 Image Generation |
|
||||
| tts | 🔊 Text-to-Speech |
|
||||
| skills | 📚 Skills |
|
||||
| todo | 📋 Task Planning |
|
||||
| memory | 💾 Memory |
|
||||
| session_search | 🔎 Session Search |
|
||||
| clarify | ❓ Clarifying Questions |
|
||||
| delegation | 👥 Task Delegation |
|
||||
| cronjob | ⏰ Cron Jobs |
|
||||
| messaging | 📨 Cross-Platform Messaging |
|
||||
|
||||
### Disabled toolsets
|
||||
|
||||
| Toolset | Description |
|
||||
| --- | --- |
|
||||
| video | 🎬 Video Analysis |
|
||||
| video_gen | 🎬 Video Generation |
|
||||
| moa | 🧠 Mixture of Agents |
|
||||
| rag_search | 🧠 RAG Search |
|
||||
| rl | 🧪 RL Training |
|
||||
| homeassistant | 🏠 Home Assistant |
|
||||
| spotify | 🎵 Spotify |
|
||||
| yuanbao | 🤖 Yuanbao |
|
||||
| computer_use | 🖱️ Computer Use (macOS) |
|
||||
|
||||
### MCP servers
|
||||
|
||||
```text
|
||||
MCP Servers:
|
||||
|
||||
Name Transport Tools Status
|
||||
──────────────── ────────────────────────────── ──────────── ──────────
|
||||
brave-search http://127.0.0.1:18802/mcp all ✓ enabled
|
||||
```
|
||||
|
||||
### Hermes profiles
|
||||
|
||||
```text
|
||||
Profile Model Gateway Alias Distribution
|
||||
─────────────── ─────────────────────────── ─────────── ─────────── ────────────────────
|
||||
◆default gpt-5.5 running — —
|
||||
atlas gpt-5.5 stopped — —
|
||||
engineer gpt-5.5 stopped — —
|
||||
glm-simple glm-5.1 stopped — —
|
||||
ops gpt-5.5 stopped — —
|
||||
orchestrator gpt-5.5 stopped — —
|
||||
researcher gpt-5.5 stopped — —
|
||||
reviewer gpt-5.5 stopped — —
|
||||
writer gpt-5.5 stopped — —
|
||||
```
|
||||
|
||||
### Hermes cron jobs
|
||||
|
||||
```text
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Scheduled Jobs │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
c515ca076b73 [active]
|
||||
Name: Hermes config git snapshot
|
||||
Schedule: 0 3 * * *
|
||||
Repeat: ∞
|
||||
Next run: 2026-05-15T03:00:00-07:00
|
||||
Deliver: discord:1494453542243532932
|
||||
Script: hermes_git_snapshot.sh
|
||||
Mode: no-agent (script stdout delivered directly)
|
||||
Last run: 2026-05-11T03:00:37.525856-07:00 ok
|
||||
|
||||
c15ee395a38d [active]
|
||||
Name: atlas-minio-self-backup
|
||||
Schedule: 0 3 * * *
|
||||
Repeat: ∞
|
||||
Next run: 2026-05-15T03:00:00-07:00
|
||||
Deliver: origin
|
||||
Script: atlas-backup-to-minio-cron.sh
|
||||
Mode: no-agent (script stdout delivered directly)
|
||||
|
||||
1ef682e65695 [active]
|
||||
Name: watch pi-agent-hermes-bound kanban
|
||||
Schedule: every 2m
|
||||
Repeat: ∞
|
||||
Next run: 2026-05-14T14:49:39.352638-07:00
|
||||
Deliver: local
|
||||
Script: watch_pi_agent_kanban.py
|
||||
Mode: no-agent (script stdout delivered directly)
|
||||
Last run: 2026-05-14T14:47:39.352638-07:00 ok
|
||||
|
||||
```
|
||||
|
||||
## Local AI and automation routing
|
||||
|
||||
| Capability | Preferred endpoint/tool | Notes |
|
||||
| --- | --- | --- |
|
||||
| Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available |
|
||||
| Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models |
|
||||
| Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation |
|
||||
| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback |
|
||||
| Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation |
|
||||
| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback |
|
||||
| Workflow automation | n8n `18808` | Durable jobs and webhooks |
|
||||
| Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index |
|
||||
|
||||
## Obsidian integration
|
||||
|
||||
| Component | Location / endpoint | Purpose |
|
||||
| --- | --- | --- |
|
||||
| Local REST API | `http://127.0.0.1:27123` and `https://127.0.0.1:27124` | Read/write notes and execute commands |
|
||||
| Autostart entry | `~/.config/autostart/obsidian-autostart.desktop` | Launches Obsidian at graphical login |
|
||||
| Autostart script | `~/.local/bin/start-obsidian-if-needed` | Idempotent launcher for Obsidian |
|
||||
| Reindex endpoint | `http://127.0.0.1:18810/reindex` | Rebuilds/updates local Obsidian/RAG index |
|
||||
| Dataview plugin | Vault `.obsidian/plugins/dataview` | Dashboard tables |
|
||||
| Tasks plugin | Vault `.obsidian/plugins/obsidian-tasks-plugin` | Dashboard task queries |
|
||||
|
||||
## Source-of-truth docs
|
||||
|
||||
| Topic | Where |
|
||||
| --- | --- |
|
||||
| Swarm operations | Hermes skill `swarm`; `~/lab/swarm/Makefile` |
|
||||
| n8n API/workflow management | Hermes skill `swarm`, reference `n8n-api-and-workflows.md` |
|
||||
| Obsidian filesystem/API usage | Hermes skill `obsidian` |
|
||||
| Hermes CLI/toolsets/gateway/profiles | Hermes skill `hermes-agent`; `hermes --help`; `hermes tools list` |
|
||||
| Obsidian automation workflows | `~/lab/swarm/swarm-common/n8n-workflows/obsidian-*.json` |
|
||||
| Runbooks | [[Runbooks Home]] |
|
||||
| OpenVINO NPU service operations | [[OpenVINO NPU Services Runbook]]; `~/lab/swarm/scripts/npu-service-health.sh` |
|
||||
|
||||
## Safety notes
|
||||
|
||||
- Do not print `.env`, API keys, tokens, auth JSON, or decrypted n8n credentials.
|
||||
- From inside the `n8n-agent` container, host services are reached via `http://172.19.0.1:<port>`, not `127.0.0.1:<port>`.
|
||||
- Use file-based workflow updates for large n8n JSON payloads.
|
||||
- After structural n8n workflow edits, deactivate/reactivate the workflow.
|
||||
- Prefer `make` targets in `~/lab/swarm` for routine service operations.
|
||||
- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof.
|
||||
- Check git status before committing; commit only targeted non-secret source/config/docs.
|
||||
|
||||
## Refresh procedure
|
||||
|
||||
To refresh this catalog:
|
||||
|
||||
```bash
|
||||
cd ~/lab/swarm
|
||||
make status
|
||||
hermes tools list
|
||||
hermes mcp list
|
||||
# Ask Atlas: "refresh the Obsidian Service Catalog"
|
||||
```
|
||||
+286
@@ -0,0 +1,286 @@
|
||||
---
|
||||
type: runbook
|
||||
system: openvino-npu-services
|
||||
status: draft
|
||||
created: 2026-06-04
|
||||
updated: 2026-06-04
|
||||
tags:
|
||||
- runbook
|
||||
- openvino
|
||||
- npu
|
||||
- swarm
|
||||
- atlas
|
||||
related:
|
||||
- [[Service Catalog]]
|
||||
- [[Swarm Operating Manual]]
|
||||
- [[Atlas Capability Upgrade Program]]
|
||||
---
|
||||
|
||||
# OpenVINO NPU Services Runbook
|
||||
|
||||
This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board.
|
||||
|
||||
Safety posture:
|
||||
- Do not restart the live Atlas/Hermes gateway from this runbook.
|
||||
- Do not change primary Atlas/Hermes routing without explicit Will approval.
|
||||
- Do not delete, overwrite, or in-place reindex existing Chroma/vector collections.
|
||||
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference.
|
||||
- Keep endpoints local-only unless Will explicitly approves broader exposure.
|
||||
- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs.
|
||||
|
||||
## Current service map
|
||||
|
||||
| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof |
|
||||
| --- | ---: | --- | --- | --- | --- | --- |
|
||||
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
|
||||
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
|
||||
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
|
||||
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
|
||||
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
|
||||
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
|
||||
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
|
||||
| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||
|
||||
Port notes:
|
||||
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
|
||||
- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
|
||||
- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
|
||||
|
||||
## Read-only unified health check
|
||||
|
||||
From the swarm repo:
|
||||
|
||||
```bash
|
||||
cd ~/lab/swarm
|
||||
./scripts/npu-service-health.sh
|
||||
```
|
||||
|
||||
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
||||
|
||||
Manual minimal checks:
|
||||
|
||||
```bash
|
||||
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
||||
cat "$BUSY"
|
||||
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
|
||||
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
|
||||
cd ~/lab/swarm && docker compose ps whisper-server-npu
|
||||
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||
```
|
||||
|
||||
Embedding NPU proof:
|
||||
|
||||
```bash
|
||||
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
||||
before=$(cat "$BUSY")
|
||||
curl -fsS http://127.0.0.1:18817/v1/embeddings \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"input":"non-private npu health probe","model":"bge-base-en-v1.5-int8-ov"}' | jq '{model, object, npu_busy_delta_us, embedding_count:(.data|length)}'
|
||||
after=$(cat "$BUSY")
|
||||
echo "sysfs_npu_busy_delta_us=$((after-before))"
|
||||
```
|
||||
|
||||
A healthy NPU path has:
|
||||
- HTTP success from the endpoint.
|
||||
- Response-level `npu_busy_delta_us > 0` when the service reports it.
|
||||
- Sysfs `after - before > 0`.
|
||||
|
||||
## Service-specific smoke checks
|
||||
|
||||
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
|
||||
|
||||
Safe foreground-server pattern:
|
||||
|
||||
```bash
|
||||
server_pid=""
|
||||
cleanup() {
|
||||
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
|
||||
kill "$server_pid"
|
||||
wait "$server_pid" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
trap cleanup EXIT
|
||||
# start prototype server with --host 127.0.0.1 --port <port> &
|
||||
# server_pid=$!
|
||||
# run curl/smoke commands, then let trap stop it
|
||||
```
|
||||
|
||||
### Whisper NPU (`:18816`)
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18816/health | jq .
|
||||
# For a real transcription smoke, use a small non-private WAV fixture only.
|
||||
# Verify both response npu_busy_delta_us and sysfs busy-time delta.
|
||||
```
|
||||
|
||||
Operational notes:
|
||||
- Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`.
|
||||
- Consistent with existing swarm service patterns because it is a containerized service with Compose health.
|
||||
- Do not restart it from this runbook unless Will asked for remediation.
|
||||
|
||||
### OpenVINO embeddings (`:18817`)
|
||||
|
||||
```bash
|
||||
systemctl --user status openvino-embeddings.service --no-pager
|
||||
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||
```
|
||||
|
||||
Operational notes:
|
||||
- User systemd unit: `openvino-embeddings.service`.
|
||||
- Model: `bge-base-en-v1.5-int8-ov`.
|
||||
- Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`.
|
||||
- Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place.
|
||||
|
||||
### Reranker prototype (`:18818`)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18818\b' || true
|
||||
cd ~/lab/swarm/openvino-reranker-npu
|
||||
source /home/will/.venvs/openvino-reranker/bin/activate
|
||||
OPENVINO_RERANKER_HOST=127.0.0.1 \
|
||||
OPENVINO_RERANKER_PORT=18818 \
|
||||
OPENVINO_RERANKER_DEVICE=NPU \
|
||||
OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \
|
||||
python server.py
|
||||
```
|
||||
|
||||
From another shell:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18818/readyz | jq .
|
||||
python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval.
|
||||
- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma.
|
||||
|
||||
### Router/classifier prototype (`:18819`)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18819\b' || true
|
||||
cd ~/lab/swarm/openvino-classifier-npu
|
||||
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
|
||||
```
|
||||
|
||||
Smoke:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18819/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18819/v1/classify \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement.
|
||||
- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
|
||||
|
||||
### Small GenAI NPU worker (`:18820`)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18820\b' || true
|
||||
cd ~/lab/swarm/openvino-genai-npu-worker
|
||||
/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
|
||||
```
|
||||
|
||||
Smoke:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18820/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18820/models | jq .
|
||||
curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq .
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
|
||||
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
|
||||
|
||||
### Document/image triage prototype (`:18829` optional review port)
|
||||
|
||||
Foreground review start only, after confirming the port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18829\b' || true
|
||||
cd ~/lab/swarm/openvino-doc-image-triage-npu
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
||||
```
|
||||
|
||||
Smoke:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18829/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18829/models | jq .
|
||||
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots.
|
||||
- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response.
|
||||
- v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks.
|
||||
|
||||
## Systemd and Compose recommendations
|
||||
|
||||
Recommended management split:
|
||||
- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`).
|
||||
- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker).
|
||||
- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates.
|
||||
|
||||
User-systemd unit expectations for optional prototypes:
|
||||
- `WorkingDirectory` points at the service directory under `~/lab/swarm/`.
|
||||
- `ExecStart` uses the existing venv path documented by the prototype.
|
||||
- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint.
|
||||
- `Restart=on-failure`, not aggressive restart loops.
|
||||
- Logs go to user journal; do not log raw request bodies.
|
||||
- Start manually for smoke; enable on boot only after Will approval.
|
||||
|
||||
Compose expectations for existing swarm services:
|
||||
- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps <service>` for read-only checks.
|
||||
- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval.
|
||||
|
||||
## Monitoring and logging notes
|
||||
|
||||
Minimum recurring monitoring should include:
|
||||
- Listener presence for `18816`, `18817`, and any approved optional prototype ports.
|
||||
- User service state for `openvino-embeddings.service` and any approved optional prototype unit.
|
||||
- Docker Compose health for `whisper-server-npu`.
|
||||
- HTTP health endpoint success.
|
||||
- Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central.
|
||||
- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys.
|
||||
|
||||
Useful log commands:
|
||||
|
||||
```bash
|
||||
journalctl --user -u openvino-embeddings.service -n 100 --no-pager
|
||||
journalctl --user -u rag-embedding-health.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-reranker.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-router-classifier.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager
|
||||
cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu
|
||||
```
|
||||
|
||||
## Approval gates
|
||||
|
||||
Requires explicit Will approval before proceeding:
|
||||
- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`.
|
||||
- Assigning a final persistent port to document/image triage or enabling it as a persistent service.
|
||||
- Enabling live RAG reranking or any request path that changes Atlas/RAG answers.
|
||||
- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions.
|
||||
- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications.
|
||||
- Restarting the live Atlas/Hermes gateway.
|
||||
- Deleting, overwriting, or in-place reindexing existing vector collections.
|
||||
- Broadening bind addresses or exposure beyond local-only defaults.
|
||||
|
||||
Approved/parked outcomes:
|
||||
- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`).
|
||||
- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`.
|
||||
- Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case.
|
||||
- Rejected for this NPU program: diffusion/image generation.
|
||||
Reference in New Issue
Block a user