docs: document NPU RAG embedding path

This commit is contained in:
William Valentin
2026-06-03 21:35:42 -07:00
parent 1a674e854e
commit 4a065de754
3 changed files with 36 additions and 12 deletions
@@ -225,14 +225,25 @@ Implemented 2026-05-13.
### Architecture
- **Vector store**: Hermes rag-search ChromaDB embedded at `~/.hermes/data/rag-search/chroma/` in the `obsidian` collection.
- **Embeddings**: Ollama `nomic-embed-text` on port `18807` (768-dim vectors).
- **Indexer**: `~/.hermes/skills/note-taking/rag-search/scripts/index_obsidian.py`
- **Vector store**: Hermes rag-search ChromaDB embedded at `~/.hermes/data/rag-search/chroma/`; live Obsidian semantic endpoint uses collection `obsidian_bge_npu`.
- **Embeddings**: OpenVINO Intel NPU service on `18817` using `bge-base-en-v1.5-int8-ov` (768-dim vectors). Legacy Ollama `nomic-embed-text` on `18807` remains available as rollback/comparison data.
- **Indexer**: `~/.hermes/skills/note-taking/rag-search/scripts/reindex_obsidian.py`
- **Chunking**: Markdown files are split by heading sections; long sections get sliding-window chunks (max 2000 chars, 200 char overlap). YAML frontmatter is extracted and stored as metadata.
- **Search**: `~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "query"`
- **Cross-collection search**: `search.py "query"` now searches all three collections (`personal`, `docs`, `obsidian`) using the appropriate embedding backend per collection.
- **Search**: `~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "query"` or Hermes native `rag_search` tool.
- **Cross-collection search**: `search.py "query"` searches `personal`, `docs`, and `obsidian` using the appropriate embedding backend per collection.
### Index stats (2026-05-13)
### Live BGE/NPU state (2026-06-03)
- Collection: `obsidian_bge_npu`
- Notes indexed: 194
- Vector count: 466
- Embedding backend: `http://127.0.0.1:18817`
- Embedding model: `bge-base-en-v1.5-int8-ov`
- OpenVINO device: Intel NPU via `openvino-embeddings.service`
- Semantic health: `curl -fsS http://127.0.0.1:18810/semantic-health`
- Embedding health: `curl -fsS http://127.0.0.1:18817/health`
### Legacy index stats (2026-05-13)
- 36 markdown files indexed
- 231 chunks
@@ -242,7 +253,7 @@ Implemented 2026-05-13.
### Incremental updates
- File content SHA-256 hashes tracked in `~/.hermes/data/rag-search/obsidian_index_state.json`.
- File content SHA-256 hashes are tracked in the collection-specific state file, e.g. `~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json` for the live BGE/NPU collection.
- Only changed files are re-indexed on subsequent runs.
- Deleted files have their chunks removed from ChromaDB.
@@ -267,6 +278,10 @@ curl -X POST http://127.0.0.1:18810/reindex | python3 -m json.tool
~/.hermes/skills/note-taking/rag-search/venv/bin/python \
~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "health monitoring"
# Check live semantic and NPU embedding health
curl -fsS http://127.0.0.1:18810/semantic-health | python3 -m json.tool
curl -fsS http://127.0.0.1:18817/health | python3 -m json.tool
# Check ChromaDB data
du -sh ~/.hermes/data/rag-search/chroma/
@@ -64,9 +64,16 @@ Most service containers run on Will's laptop/host network and publish local/LAN
### Ollama
- **Port:** `18807`
- **Role:** embeddings runtime for OpenClaw memory search
- **Role:** legacy/rollback embeddings runtime for memory/RAG search
- **Model:** `nomic-embed-text`
### OpenVINO embeddings
- **Port:** `18817`
- **Unit:** `openvino-embeddings.service`
- **Role:** default embeddings service for live Obsidian RAG via Intel NPU
- **Model:** `bge-base-en-v1.5-int8-ov`
- **Health:** `http://127.0.0.1:18817/health`
## Adjacent storage / infra
### MinIO
@@ -1,7 +1,7 @@
---
type: service-catalog
created: 2026-05-14T14:50:46-07:00
updated: 2026-05-27T12:12:06-07:00
updated: 2026-06-03T21:31:01-07:00
tags:
- service-catalog
- swarm
@@ -13,7 +13,7 @@ tags:
Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs.
> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`. Secrets are intentionally omitted.
> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`; Obsidian/RAG embedding path refreshed on `2026-06-03T21:31:01-07:00`. Secrets are intentionally omitted.
## Quick links
@@ -53,7 +53,8 @@ Canonical index of local services, automation tools, Hermes capabilities, and wh
| Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` |
| URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` |
| Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` |
| RAG/embedding health | 18814 | OK 200 | RAG/Ollama/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` |
| RAG/embedding health | 18814 | OK 200 | RAG/OpenVINO/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` |
| OpenVINO embeddings | 18817 | OK 200 | Intel NPU embeddings service for live Obsidian RAG | `http://127.0.0.1:18817/health` |
| Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` |
## Docker services
@@ -90,7 +91,8 @@ Important known services:
| `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 |
| `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 |
| `voice-memo-processor.service` | Voice memo processing on 18813 |
| `rag-embedding-health.service` | RAG/Ollama/Obsidian health check wrapper on 18814 |
| `rag-embedding-health.service` | RAG/OpenVINO/Obsidian health check wrapper on 18814 |
| `openvino-embeddings.service` | Intel NPU BGE embedding service on 18817 |
Useful checks: