docs: document NPU RAG embedding path

This commit is contained in:
William Valentin
2026-06-03 21:35:42 -07:00
parent 1a674e854e
commit 4a065de754
3 changed files with 36 additions and 12 deletions
@@ -225,14 +225,25 @@ Implemented 2026-05-13.
### Architecture ### Architecture
- **Vector store**: Hermes rag-search ChromaDB embedded at `~/.hermes/data/rag-search/chroma/` in the `obsidian` collection. - **Vector store**: Hermes rag-search ChromaDB embedded at `~/.hermes/data/rag-search/chroma/`; live Obsidian semantic endpoint uses collection `obsidian_bge_npu`.
- **Embeddings**: Ollama `nomic-embed-text` on port `18807` (768-dim vectors). - **Embeddings**: OpenVINO Intel NPU service on `18817` using `bge-base-en-v1.5-int8-ov` (768-dim vectors). Legacy Ollama `nomic-embed-text` on `18807` remains available as rollback/comparison data.
- **Indexer**: `~/.hermes/skills/note-taking/rag-search/scripts/index_obsidian.py` - **Indexer**: `~/.hermes/skills/note-taking/rag-search/scripts/reindex_obsidian.py`
- **Chunking**: Markdown files are split by heading sections; long sections get sliding-window chunks (max 2000 chars, 200 char overlap). YAML frontmatter is extracted and stored as metadata. - **Chunking**: Markdown files are split by heading sections; long sections get sliding-window chunks (max 2000 chars, 200 char overlap). YAML frontmatter is extracted and stored as metadata.
- **Search**: `~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "query"` - **Search**: `~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "query"` or Hermes native `rag_search` tool.
- **Cross-collection search**: `search.py "query"` now searches all three collections (`personal`, `docs`, `obsidian`) using the appropriate embedding backend per collection. - **Cross-collection search**: `search.py "query"` searches `personal`, `docs`, and `obsidian` using the appropriate embedding backend per collection.
### Index stats (2026-05-13) ### Live BGE/NPU state (2026-06-03)
- Collection: `obsidian_bge_npu`
- Notes indexed: 194
- Vector count: 466
- Embedding backend: `http://127.0.0.1:18817`
- Embedding model: `bge-base-en-v1.5-int8-ov`
- OpenVINO device: Intel NPU via `openvino-embeddings.service`
- Semantic health: `curl -fsS http://127.0.0.1:18810/semantic-health`
- Embedding health: `curl -fsS http://127.0.0.1:18817/health`
### Legacy index stats (2026-05-13)
- 36 markdown files indexed - 36 markdown files indexed
- 231 chunks - 231 chunks
@@ -242,7 +253,7 @@ Implemented 2026-05-13.
### Incremental updates ### Incremental updates
- File content SHA-256 hashes tracked in `~/.hermes/data/rag-search/obsidian_index_state.json`. - File content SHA-256 hashes are tracked in the collection-specific state file, e.g. `~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json` for the live BGE/NPU collection.
- Only changed files are re-indexed on subsequent runs. - Only changed files are re-indexed on subsequent runs.
- Deleted files have their chunks removed from ChromaDB. - Deleted files have their chunks removed from ChromaDB.
@@ -267,6 +278,10 @@ curl -X POST http://127.0.0.1:18810/reindex | python3 -m json.tool
~/.hermes/skills/note-taking/rag-search/venv/bin/python \ ~/.hermes/skills/note-taking/rag-search/venv/bin/python \
~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "health monitoring" ~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "health monitoring"
# Check live semantic and NPU embedding health
curl -fsS http://127.0.0.1:18810/semantic-health | python3 -m json.tool
curl -fsS http://127.0.0.1:18817/health | python3 -m json.tool
# Check ChromaDB data # Check ChromaDB data
du -sh ~/.hermes/data/rag-search/chroma/ du -sh ~/.hermes/data/rag-search/chroma/
@@ -64,9 +64,16 @@ Most service containers run on Will's laptop/host network and publish local/LAN
### Ollama ### Ollama
- **Port:** `18807` - **Port:** `18807`
- **Role:** embeddings runtime for OpenClaw memory search - **Role:** legacy/rollback embeddings runtime for memory/RAG search
- **Model:** `nomic-embed-text` - **Model:** `nomic-embed-text`
### OpenVINO embeddings
- **Port:** `18817`
- **Unit:** `openvino-embeddings.service`
- **Role:** default embeddings service for live Obsidian RAG via Intel NPU
- **Model:** `bge-base-en-v1.5-int8-ov`
- **Health:** `http://127.0.0.1:18817/health`
## Adjacent storage / infra ## Adjacent storage / infra
### MinIO ### MinIO
@@ -1,7 +1,7 @@
--- ---
type: service-catalog type: service-catalog
created: 2026-05-14T14:50:46-07:00 created: 2026-05-14T14:50:46-07:00
updated: 2026-05-27T12:12:06-07:00 updated: 2026-06-03T21:31:01-07:00
tags: tags:
- service-catalog - service-catalog
- swarm - swarm
@@ -13,7 +13,7 @@ tags:
Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs. Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs.
> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`. Secrets are intentionally omitted. > Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`; Obsidian/RAG embedding path refreshed on `2026-06-03T21:31:01-07:00`. Secrets are intentionally omitted.
## Quick links ## Quick links
@@ -53,7 +53,8 @@ Canonical index of local services, automation tools, Hermes capabilities, and wh
| Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` | | Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` |
| URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` | | URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` |
| Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` | | Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` |
| RAG/embedding health | 18814 | OK 200 | RAG/Ollama/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` | | RAG/embedding health | 18814 | OK 200 | RAG/OpenVINO/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` |
| OpenVINO embeddings | 18817 | OK 200 | Intel NPU embeddings service for live Obsidian RAG | `http://127.0.0.1:18817/health` |
| Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` | | Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` |
## Docker services ## Docker services
@@ -90,7 +91,8 @@ Important known services:
| `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 | | `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 |
| `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 | | `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 |
| `voice-memo-processor.service` | Voice memo processing on 18813 | | `voice-memo-processor.service` | Voice memo processing on 18813 |
| `rag-embedding-health.service` | RAG/Ollama/Obsidian health check wrapper on 18814 | | `rag-embedding-health.service` | RAG/OpenVINO/Obsidian health check wrapper on 18814 |
| `openvino-embeddings.service` | Intel NPU BGE embedding service on 18817 |
Useful checks: Useful checks: