feat(rag): add OpenVINO NPU embeddings service

This commit is contained in:
William Valentin
2026-06-03 18:28:16 -07:00
parent 7745648a13
commit fe4dea0f07
7 changed files with 268 additions and 14 deletions
+2 -2
View File
@@ -89,7 +89,7 @@
<g><rect x="965" y="775" width="210" height="60" rx="9" fill="#0f172a"/><rect x="965" y="775" width="210" height="60" rx="9" fill="rgba(76,29,149,.4)" stroke="#a78bfa" stroke-width="1.6"/><text x="1070" y="802" text-anchor="middle" class="title">Obsidian / RAG</text><text x="1070" y="822" text-anchor="middle" class="port">:27123/:27124 + ChromaDB</text></g>
<!-- host local ai box -->
<g><rect x="280" y="675" width="190" height="100" rx="10" fill="#0f172a"/><rect x="280" y="675" width="190" height="100" rx="10" fill="rgba(76,29,149,.4)" stroke="#a78bfa" stroke-width="1.8"/><text x="375" y="706" text-anchor="middle" class="title">host local AI</text><text x="375" y="730" text-anchor="middle" class="tiny">llama.cpp :18806</text><text x="375" y="752" text-anchor="middle" class="tiny">Ollama embed :18807</text></g>
<g><rect x="280" y="675" width="210" height="120" rx="10" fill="#0f172a"/><rect x="280" y="675" width="210" height="120" rx="10" fill="rgba(76,29,149,.4)" stroke="#a78bfa" stroke-width="1.8"/><text x="385" y="706" text-anchor="middle" class="title">host local AI</text><text x="385" y="730" text-anchor="middle" class="tiny">llama.cpp :18806</text><text x="385" y="752" text-anchor="middle" class="tiny">Ollama fallback :18807</text><text x="385" y="774" text-anchor="middle" class="tiny">OpenVINO NPU embed :18817</text></g>
<!-- legend -->
<g transform="translate(40,820)">
@@ -104,7 +104,7 @@
</div>
<div class="cards">
<div class="info"><h3>Monitoring model</h3><ul><li>• n8n direct probes critical ports</li><li>• agentmon aggregates Docker/OpenClaw snapshots</li><li>• n8n polls agentmon for stale/degraded state</li></ul></div>
<div class="info"><h3>Operational endpoints</h3><ul><li>• n8n: 127.0.0.1:18808</li><li>• agentmon query/UI: 8081 / 8082</li><li>• local LLM/embed: 18806 / 18807</li></ul></div>
<div class="info"><h3>Operational endpoints</h3><ul><li>• n8n: 127.0.0.1:18808</li><li>• agentmon query/UI: 8081 / 8082</li><li>• local LLM/embed: 18806 / 18817</li><li>• Ollama fallback: 18807</li></ul></div>
<div class="info"><h3>Source paths</h3><ul><li>• Swarm repo: ~/lab/swarm</li><li>• Agentmon repo: ~/lab/agentmon</li><li>• Workflows: swarm-common/n8n-workflows</li></ul></div>
</div>
<div class="footer">Generated as repo documentation. Open locally in a browser; no JavaScript, all SVG inline.</div>
+6 -3
View File
@@ -32,7 +32,8 @@ local AI/search/voice services
+--> SearXNG :18803
+--> Brave MCP :18802
+--> llama.cpp :18806
+--> Ollama embeddings :18807
+--> Ollama embeddings :18807 (legacy/CPU fallback)
+--> OpenVINO NPU embeddings :18817
+--> Kokoro TTS :18805
+--> Whisper NPU :18816
```
@@ -121,7 +122,8 @@ Docker services:
Host/user services:
- `llama-server.service``:18806`, local llama.cpp OpenAI-compatible LLM
- `ollama.service``:18807`, embeddings API
- `ollama.service``:18807`, legacy/CPU embeddings API fallback
- `openvino-embeddings.service``:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`)
- `docker-health-endpoint.service``:18809`, read-only container health for n8n
- `obsidian-reindex-endpoint.service``:18810`, Obsidian/RAG reindex trigger
- `url-content-extractor.service``:18812`, YouTube/PDF/web extraction
@@ -143,7 +145,8 @@ RAG/vector store:
- ChromaDB path: `~/.hermes/data/rag-search/chroma/`
- Reindex state/progress: `~/.hermes/data/rag-search/obsidian_index_state.json` and `obsidian_reindex_progress.json`
- Embeddings backend: Ollama on `:18807`, normally `nomic-embed-text`
- RAG query/reindex embedding backend: still Ollama on `:18807` with `nomic-embed-text` until a deliberate full Chroma rebuild/migration is run.
- RAG/embedding health probe backend: OpenVINO NPU embeddings service on `:18817`, currently `bge-base-en-v1.5-int8-ov`.
- Reindex endpoint: `POST :18810/reindex` for incremental updates, `POST :18810/reindex?full=true` for full semantic rebuilds, `GET :18810/semantic-health` to verify vectors plus a search smoke test.
## Monitoring model