diff --git a/docs/swarm-infrastructure.html b/docs/swarm-infrastructure.html index 0158a86..b56ae74 100644 --- a/docs/swarm-infrastructure.html +++ b/docs/swarm-infrastructure.html @@ -87,7 +87,7 @@ VoiceKokoro + Whisper:18805 / :18816 Docker servicesagentmon.monitor=trueswarm/service snapshots OpenClaw VMscurrently dormantopenclaw.snapshot - Obsidian / RAG:18810 semantic searchNPU embed; optional rerank + Obsidian / RAG:18810 semantic searchNPU embed + rerank NPU sidecarsapproved prototypes; not live:18818/:18819/:18820/:18829 @@ -106,10 +106,10 @@

Monitoring model

-

Operational endpoints

+

Operational endpoints

Source paths

- + diff --git a/docs/swarm-infrastructure.md b/docs/swarm-infrastructure.md index 06fc0fd..347912a 100644 --- a/docs/swarm-infrastructure.md +++ b/docs/swarm-infrastructure.md @@ -160,7 +160,7 @@ RAG/vector store: - Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on `:18817`, currently `bge-base-en-v1.5-int8-ov`, collection `obsidian_bge_npu`. - Legacy comparison/fallback collection: `obsidian`, built with Ollama on `:18807` using `nomic-embed-text`. - Reindex/search endpoint: `POST :18810/reindex` for incremental updates, `POST :18810/reindex?full=true` for full semantic rebuilds, `GET :18810/semantic-health` to verify vectors plus a search smoke test, and `POST :18810/semantic-search` for n8n/Hermes semantic context lookup. -- Optional reranker path: `RAG_RERANK_ENABLED=false` by default. When enabled, `/semantic-search` retrieves `RAG_RERANK_INITIAL_K` vector candidates, calls `RAG_RERANK_URL` (`http://127.0.0.1:18818/rerank` by default), returns reranked `RAG_RERANK_TOP_K`, requires positive `npu_busy_delta_us` by default (`RAG_RERANK_REQUIRE_NPU_PROOF=true`), and falls back to vector order with `rerank.error` metadata on timeout/error/non-positive NPU proof. Reranking is request-time only and must not mutate Chroma/vector collections. +- Reranker path: `RAG_RERANK_ENABLED=true` for `:18810/semantic-search` after local bake testing. `/semantic-search` retrieves `RAG_RERANK_INITIAL_K` vector candidates, calls `RAG_RERANK_URL` (`http://127.0.0.1:18818/rerank`), returns reranked `RAG_RERANK_TOP_K`, requires positive `npu_busy_delta_us` by default (`RAG_RERANK_REQUIRE_NPU_PROOF=true`), and falls back to vector order with `rerank.error` metadata on timeout/error/non-positive NPU proof. Reranking is request-time only and must not mutate Chroma/vector collections. ## Monitoring model @@ -254,4 +254,4 @@ jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json - From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:` for host-published swarm services. - Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing. - OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default. -- OpenVINO NPU sidecars on `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. The `:18818` reranker is also a prototype service, but `:18810/semantic-search` now has a disabled-by-default request-time rerank hook that falls back safely when `:18818` is unavailable. Do not draw live Atlas/Hermes/classifier/GenAI arrows to prototypes until approval and implementation actually exist. +- OpenVINO NPU sidecars on `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes/classifier/GenAI arrows to prototypes until approval and implementation actually exist. diff --git a/swarm-common/obsidian-reindex-endpoint.service b/swarm-common/obsidian-reindex-endpoint.service index b74ef17..8414517 100644 --- a/swarm-common/obsidian-reindex-endpoint.service +++ b/swarm-common/obsidian-reindex-endpoint.service @@ -11,13 +11,13 @@ Environment=PORT=18810 Environment=RAG_COLLECTION=obsidian_bge_npu Environment=RAG_EMBED_MODEL=bge-base-en-v1.5-int8-ov Environment=OLLAMA_BASE_URL=http://127.0.0.1:18817 -# Optional request-time second-stage reranking. Disabled by default so :18810 -# keeps working when the :18818 prototype is stopped or not yet approved live. -Environment=RAG_RERANK_ENABLED=false +# Request-time second-stage reranking. The :18810 handler keeps vector-order +# fallback on reranker timeout/error or missing positive NPU proof. +Environment=RAG_RERANK_ENABLED=true Environment=RAG_RERANK_URL=http://127.0.0.1:18818/rerank Environment=RAG_RERANK_INITIAL_K=20 Environment=RAG_RERANK_TOP_K=5 -Environment=RAG_RERANK_TIMEOUT_MS=3000 +Environment=RAG_RERANK_TIMEOUT_MS=1500 Environment=RAG_RERANK_REQUIRE_NPU_PROOF=true [Install]