feat(swarm): add Obsidian vault reindex endpoint + update handoff

- obsidian-reindex-server.py: HTTP endpoint on port 18810 for
  triggering incremental Obsidian vault reindex from n8n
- Updated n8n Implementation Handoff: Obsidian Semantic Index
  section, new reindex workflow, updated verification commands
This commit is contained in:
William Valentin
2026-05-13 15:18:50 -07:00
parent aa77e11b3a
commit 6c13a60f57
2 changed files with 206 additions and 30 deletions
@@ -19,11 +19,13 @@ Last verified on 2026-05-13 (evening):
- Container: `n8n-agent` running and healthy.
- Health endpoint: `GET /healthz` returns `{"status":"ok"}`.
- Workflow export: `n8n export:workflow --all` succeeds.
- Active workflows: 12.
- Active workflows: 13.
- Inactive workflows: 1 (Nightly Obsidian Vault Sync replaced by Evening Digest).
- Archived workflows: 2 unrecoverable duplicate IMAP workflows archived after SQLite recovery.
- Docker health endpoint: `GET :18809/health` returns container state for 7 services.
- Systemd user service `docker-health-endpoint.service` active and enabled.
- Obsidian reindex endpoint: `POST :18810/reindex` triggers incremental vault reindex.
- Systemd user service `obsidian-reindex-endpoint.service` active and enabled.
## Implemented and active
@@ -207,6 +209,74 @@ Last verified on 2026-05-13 (evening):
- `run_health_check`
- `process_voice_memo`
### Obsidian Vault Reindex
- Workflow ID: `85ntyyphDJ4Ms2b4`
- Status: active
- Trigger: every 6 hours
- Current behavior:
- n8n schedule trigger calls `POST http://172.19.0.1:18810/reindex`.
- Host-side `obsidian-reindex-server.py` on port `18810` runs the incremental Obsidian vault indexer.
- Systemd user service `obsidian-reindex-endpoint.service`.
## Obsidian Semantic Index
Implemented 2026-05-13.
### Architecture
- **Vector store**: Hermes rag-search ChromaDB embedded at `~/.hermes/data/rag-search/chroma/` in the `obsidian` collection.
- **Embeddings**: Ollama `nomic-embed-text` on port `18807` (768-dim vectors).
- **Indexer**: `~/.hermes/skills/note-taking/rag-search/scripts/index_obsidian.py`
- **Chunking**: Markdown files are split by heading sections; long sections get sliding-window chunks (max 2000 chars, 200 char overlap). YAML frontmatter is extracted and stored as metadata.
- **Search**: `~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "query"`
- **Cross-collection search**: `search.py "query"` now searches all three collections (`personal`, `docs`, `obsidian`) using the appropriate embedding backend per collection.
### Index stats (2026-05-13)
- 36 markdown files indexed
- 231 chunks
- Embedding model: `nomic-embed-text` via Ollama
- Full index time: ~5 minutes (Ollama CPU inference at ~1.2s/text, batch=10)
- Incremental reindex (no changes): ~1.4 seconds
### Incremental updates
- File content SHA-256 hashes tracked in `~/.hermes/data/rag-search/obsidian_index_state.json`.
- Only changed files are re-indexed on subsequent runs.
- Deleted files have their chunks removed from ChromaDB.
### Automated reindex
- n8n workflow `Obsidian Vault Reindex` (`85ntyyphDJ4Ms2b4`) triggers every 6 hours.
- Calls `POST http://172.19.0.1:18810/reindex` (host-side endpoint).
- Host endpoint: `~/lab/swarm/scripts/obsidian-reindex-server.py` on port `18810`.
- Systemd service: `obsidian-reindex-endpoint.service` (enabled).
- Manual trigger: `curl -X POST http://127.0.0.1:18810/reindex`
### Verification commands
```bash
# Check index state
curl -fsS http://127.0.0.1:18810/reindex/status | python3 -m json.tool
# Trigger manual reindex
curl -X POST http://127.0.0.1:18810/reindex | python3 -m json.tool
# Search the Obsidian index
~/.hermes/skills/note-taking/rag-search/venv/bin/python \
~/.hermes/skills/note-taking/rag-search/scripts/search.py --index obsidian "health monitoring"
# Check ChromaDB data
du -sh ~/.hermes/data/rag-search/chroma/
# Check systemd service
systemctl --user status obsidian-reindex-endpoint.service
# Verify from inside n8n container
docker exec n8n-agent wget -qO- http://172.19.0.1:18810/healthz
```
## Not yet implemented
### Weekly review
@@ -226,27 +296,6 @@ Recommended implementation:
3. Use Atlas/Hermes or cloud model for final synthesis.
4. Write `Notes/YYYY-MM-DD Weekly Review.md`.
### Obsidian Semantic Index
Desired scope:
- Watch vault changes.
- Chunk changed notes.
- Embed with Ollama on `18807` using `nomic-embed-text`.
- Store vectors locally.
- Enable semantic search / RAG for Atlas.
Recommended implementation options:
1. Prefer Hermes `rag-search`/local ChromaDB if already available and stable.
2. If n8n owns the trigger, have n8n call a local indexing webhook/script rather than implementing vector DB logic entirely in n8n.
3. Use file-change polling if native file watch is unreliable in Docker/virtiofs.
Open questions:
- Which vector store should be canonical: Hermes rag-search ChromaDB, a separate Chroma instance, SQLite vector extension, or another local store?
- Should n8n trigger indexing, or should Atlas/Hermes own indexing and n8n only notify?
### Personal data routing
Desired scope:
@@ -277,11 +326,7 @@ Recommended implementation:
4. ~~Extend Evening Digest.~~ Done 2026-05-13: workflow `PlZywwqL8MRNEAN6`, daily 21:00 PT.
5. ~~Add Discord delivery to n8n Failure Digest.~~ Done 2026-05-13.
6. ~~Fix stale container URLs in IMAP workflow.~~ Done 2026-05-13.
7. Implement Obsidian Semantic Index.
- Decide canonical vector store first.
- Use Ollama embeddings on `18807`.
- Add incremental update path.
7. ~~Implement Obsidian Semantic Index.~~ Done 2026-05-13: ChromaDB `obsidian` collection, Ollama nomic-embed-text, automated reindex every 6h.
8. Upgrade Web-to-Notes Capture.
- Add PDF and YouTube transcript support.
@@ -292,8 +337,8 @@ Recommended implementation:
- Add optional Kokoro audio summary.
10. Define webhook action bus catalog.
- Document stable endpoints and schemas.
- Add `process_url`, `summarize_pdf`, `add_reminder`, `sync_vault`, `run_health_check`.
- Document stable endpoints and schemas.
- Add `process_url`, `summarize_pdf`, `add_reminder`, `sync_vault`, `run_health_check`.
## Verification commands
@@ -311,8 +356,13 @@ docker exec n8n-agent n8n export:workflow --all --output=/tmp/workflows-verify.j
# Docker health endpoint (host-side systemd service)
curl -fsS --max-time 3 http://127.0.0.1:18809/health | python3 -m json.tool
# Obsidian reindex endpoint
curl -fsS http://127.0.0.1:18810/healthz
curl -fsS http://127.0.0.1:18810/reindex/status | python3 -m json.tool
# Verify from inside n8n container
docker exec n8n-agent wget -qO- http://172.19.0.1:18809/health
docker exec n8n-agent wget -qO- http://172.19.0.1:18810/healthz
```
### n8n Public API access
@@ -358,4 +408,6 @@ docker logs n8n-agent --tail 120
- `N8N_API_KEY` in `~/lab/swarm/.env` is stale and returns 401. Get the working key from n8n credential `UPAHgUJVRqZQceL4` (see Verification commands).
- Do not commit DB backups, workflow execution history, secrets, or runtime state.
- The Google OAuth credential (`wpcf2epDDCT57Y5x`) cannot refresh (`invalid_client`). Gmail workflows use IMAP fallback instead.
- The Docker health endpoint (`18809`) must bind to `0.0.0.0` (not `127.0.0.1`) so the n8n container can reach it via the Docker bridge gateway.
- The Docker health endpoint (`18809`) and reindex endpoint (`18810`) must bind to `0.0.0.0` (not `127.0.0.1`) so the n8n container can reach them via the Docker bridge gateway.
- The `obsidian` ChromaDB collection uses Ollama `nomic-embed-text` embeddings, while `personal` and `docs` use Sentence Transformers `all-MiniLM-L6-v2`. They cannot be compared directly by score across backends.
- Ollama embedding on CPU is ~1.2s per text. Full vault reindex takes ~5 minutes for 231 chunks. Incremental (no changes) takes ~1.4 seconds.