feat(swarm): add URL content extractor + voice memo processor + webhook catalog
- url-content-extractor.py on :18812: YouTube/PDF/web content extraction - voice-memo-processor.py on :18813: Telegram/Discord/URL voice ingress + Kokoro TTS - Webhook Action Bus catalog in Obsidian vault - Updated n8n Implementation Handoff: items #8-10 done
This commit is contained in:
+449
@@ -0,0 +1,449 @@
|
||||
---
|
||||
title: Webhook Action Bus
|
||||
area: infrastructure
|
||||
tags: [infrastructure, automation, webhooks, n8n, api]
|
||||
created: 2026-05-13
|
||||
updated: 2026-05-13
|
||||
status: active
|
||||
related: "[[Infrastructure/Automation/n8n Workflows]], [[Infrastructure/Architecture]], [[Infrastructure/Services/Docker Services]]"
|
||||
---
|
||||
|
||||
# Webhook Action Bus
|
||||
|
||||
Central catalog of all webhook endpoints in the n8n automation stack. Every webhook-triggered workflow and host-side HTTP endpoint is documented here with its URL, method, authentication, request/response schemas, and implementation status.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
External Caller
|
||||
|
|
||||
v
|
||||
n8n Webhook (port 18808)
|
||||
|
|
||||
+-- /webhook/openclaw-action --> OpenClaw Action Bus (router)
|
||||
+-- /webhook/openclaw-reminder --> Reminder Webhook
|
||||
+-- /webhook/web-to-notes --> Web-to-Notes Capture
|
||||
+-- /webhook/voice-memo --> Voice Memo Capture
|
||||
|
|
||||
Host-side Services (from Docker: 172.19.0.1)
|
||||
+-- :18809/health --> Docker Container Health
|
||||
+-- :18810/reindex --> Obsidian Vault Reindex
|
||||
+-- :18810/healthz --> Obsidian Reindex Health
|
||||
+-- :18810/reindex/status --> Obsidian Reindex Status
|
||||
```
|
||||
|
||||
### n8n Webhook URL Structure
|
||||
|
||||
All n8n webhooks follow this pattern:
|
||||
|
||||
```
|
||||
http://{host}:18808/webhook/{path}
|
||||
```
|
||||
|
||||
In production (from inside Docker), n8n sees itself at `http://localhost:18808/` with `WEBHOOK_URL=http://localhost:18808/`.
|
||||
|
||||
---
|
||||
|
||||
## Endpoint Catalog
|
||||
|
||||
### 1. OpenClaw Action Bus
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Workflow** | OpenClaw Action Bus |
|
||||
| **Workflow ID** | `Jwi54VWMdlLqYnRo` |
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `POST http://{host}:18808/webhook/openclaw-action` |
|
||||
| **Authentication** | Header Auth (`OpenClaw Webhook Header` credential) |
|
||||
| **Auth Header** | `x-openclaw-secret: {secret}` |
|
||||
|
||||
**Purpose:** Central action router. Accepts a JSON body with an `action` field and routes to the appropriate handler. Supports 30+ actions including email, calendar, tasks, drive, docs, notifications, approvals, and URL fetching.
|
||||
|
||||
#### Request Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "string (required) - one of the supported action names",
|
||||
"args": {
|
||||
"// action-specific parameters, see below"
|
||||
},
|
||||
"request_id": "string (optional) - client-supplied correlation ID"
|
||||
}
|
||||
```
|
||||
|
||||
#### Supported Actions
|
||||
|
||||
| Action | Args | Description |
|
||||
|--------|------|-------------|
|
||||
| `notify` | `{ message: string }` | Send a notification via Telegram |
|
||||
| `send_notification_draft` | `{ title, message, draft_id? }` | Create & send a notification draft (requires approval) |
|
||||
| `fetch_and_normalize_url` | `{ url: string, max_chars?: number (500-20000, default 8000) }` | Fetch a URL, strip HTML, return clean text |
|
||||
| `send_email_draft` | `{ to: string[], cc?: string[], subject: string, body: string }` | Create an email draft for approval |
|
||||
| `list_email_drafts` | `{ max?: number (1-100, default 20), page?: string }` | List pending email drafts |
|
||||
| `delete_email_draft` | `{ draft_id: string }` | Delete an email draft |
|
||||
| `send_gmail_draft` | `{ draft_id: string }` | Send an approved email draft |
|
||||
| `send_approved_email` | `{ draft_id: string }` | Alias for `send_gmail_draft` |
|
||||
| `create_calendar_event` | `{ title, start, end, description?, location?, calendar? }` | Create a calendar event |
|
||||
| `list_upcoming_events` | `{ calendar?: string, max?: number (1-100, default 20), days_ahead?: number }` | List upcoming calendar events |
|
||||
| `update_calendar_event` | `{ calendar?, event_id, title?, start?, end?, description?, location? }` | Update a calendar event |
|
||||
| `delete_calendar_event` | `{ calendar?, event_id: string }` | Delete a calendar event |
|
||||
| `tasks_add` | `{ title: string, notes?: string, due?: string, tasklist_id?: string }` | Add a Google Task |
|
||||
| `tasks_list` | `{ tasklist_id?: string, max?: number (1-100, default 20) }` | List Google Tasks |
|
||||
| `tasks_done` | `{ task_id: string, tasklist_id?: string }` | Mark a Google Task as done |
|
||||
| `tasks_delete` | `{ task_id: string, tasklist_id?: string }` | Delete a Google Task |
|
||||
| `drive_search` | `{ query: string, max?: number (1-50, default 10) }` | Search Google Drive |
|
||||
| `drive_upload` | `{ local_path: string, folder_id?: string }` | Upload file to Google Drive |
|
||||
| `drive_download` | `{ file_id: string, dest_path: string }` | Download file from Google Drive |
|
||||
| `docs_list` | `{ max?: number (1-50, default 10) }` | List Google Docs |
|
||||
| `docs_read` | `{ doc_id: string }` | Read a Google Doc |
|
||||
| `docs_create` | `{ title: string, content?: string }` | Create a Google Doc |
|
||||
| `docs_write` | `{ doc_id: string, content: string }` | Write to a Google Doc |
|
||||
| `docs_export` | `{ doc_id: string, format?: string (default 'md') }` | Export a Google Doc |
|
||||
| `approval_queue_add` | `{ kind?: string, summary: string }` | Add item to approval queue |
|
||||
| `approval_queue_list` | `{ limit?: number, include_history?: boolean }` | List approval queue |
|
||||
| `approval_queue_resolve` | `{ id: string, decision: 'approve' or 'reject' }` | Resolve an approval item |
|
||||
| `approval_history_attach_execution` | `{ id: string, execution: object }` | Attach execution data to approval history |
|
||||
| `append_log` | `{ text: string }` | Append to the action log |
|
||||
| `get_logs` | `{ limit?: number (1-50, default 20) }` | Retrieve action log entries |
|
||||
| `inbound_event_filter` | `{ ...event data }` | Classify an inbound event |
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"action": "string - the action that was executed",
|
||||
"// ... action-specific response fields"
|
||||
}
|
||||
```
|
||||
|
||||
Error responses:
|
||||
```json
|
||||
{
|
||||
"ok": false,
|
||||
"error": "string - error description",
|
||||
"statusCode": 400
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. OpenClaw Reminder Webhook
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Workflow** | OpenClaw Reminder Webhook |
|
||||
| **Workflow ID** | `RUR1CGn0ikkxbPin` |
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `POST http://{host}:18808/webhook/openclaw-reminder` |
|
||||
| **Authentication** | Header Auth (`OpenClaw Webhook Header` credential) |
|
||||
| **Auth Header** | `x-openclaw-secret: {secret}` |
|
||||
|
||||
**Purpose:** Accepts a reminder payload and sends it to both Telegram and Discord.
|
||||
|
||||
#### Request Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "string (required) - reminder title",
|
||||
"dueAt": "string (optional) - due date/time, e.g. '2026-05-14T09:00:00'",
|
||||
"context": "string (optional) - additional context for the reminder"
|
||||
}
|
||||
```
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"sentTelegram": true,
|
||||
"sentDiscord": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Web-to-Notes Capture
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Workflow** | Web-to-Notes Capture (Local LLM + Obsidian) |
|
||||
| **Workflow ID** | `GSmzuA5dgGgyRg5v` |
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `POST http://{host}:18808/webhook/web-to-notes` |
|
||||
| **Authentication** | None |
|
||||
| **Webhook ID** | `7958ecbc-c714-41d5-a829-882447ab95f8` |
|
||||
|
||||
**Purpose:** Captures a URL, fetches its content, summarizes it with the local LLM (Gemma), and saves the result as an Obsidian note. Also sends a Telegram notification.
|
||||
|
||||
#### Request Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "string (required) - HTTP(S) URL to capture",
|
||||
"title": "string (optional) - override title (default: extracted from page)",
|
||||
"notes": "string (optional) - personal notes/comment about the capture",
|
||||
"tags": "string[] | string (optional) - comma-separated or array of tags (default: ['web-capture'])"
|
||||
}
|
||||
```
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"notePath": "string - Obsidian vault path, e.g. 'Notes/2026-05-13 My Page.md'",
|
||||
"title": "string - the note title",
|
||||
"source": "string - the original URL"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Voice Memo Capture
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Workflow** | Voice Memo Capture (Audio URL + Local Whisper) |
|
||||
| **Workflow ID** | `El1BHJZ56JlzhrRZ` |
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `POST http://{host}:18808/webhook/voice-memo` |
|
||||
| **Authentication** | None |
|
||||
| **Webhook ID** | `06796590-13b3-4347-9582-1ac92719c95d` |
|
||||
|
||||
**Purpose:** Downloads an audio file from a URL, transcribes it with the local Whisper service, summarizes with the local LLM, and saves as an Obsidian note. Sends a Telegram notification.
|
||||
|
||||
#### Request Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"audio_url": "string (required) - HTTP(S) URL to the audio file",
|
||||
"title": "string (optional) - title for the note (default: 'Voice Memo')",
|
||||
"source": "string (optional) - source attribution (default: the audio_url)",
|
||||
"tags": "string[] | string (optional) - tags (default: ['voice', 'memo'])"
|
||||
}
|
||||
```
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"notePath": "string - Obsidian vault path, e.g. 'Voice Memos/2026-05-13-my-memo.md'",
|
||||
"title": "string - the note title"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Docker Container Health
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `GET http://{host}:18809/health` |
|
||||
| **Authentication** | None |
|
||||
|
||||
**Purpose:** Returns the health status of all Docker containers in the swarm.
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"containers": [
|
||||
{
|
||||
"name": "string - container name",
|
||||
"status": "string - e.g. 'running'",
|
||||
"image": "string - container image"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Obsidian Vault Reindex
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `POST http://{host}:18810/reindex` |
|
||||
| **Authentication** | None |
|
||||
| **Timeout** | 300s (5 min) |
|
||||
|
||||
**Purpose:** Triggers a full reindex of the Obsidian vault for search.
|
||||
|
||||
#### Response
|
||||
|
||||
Returns the reindex result (status, file count, etc.)
|
||||
|
||||
---
|
||||
|
||||
### 7. Obsidian Reindex Health Check
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `GET http://{host}:18810/healthz` |
|
||||
| **Authentication** | None |
|
||||
|
||||
**Purpose:** Health check for the Obsidian reindex service.
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 8. Obsidian Reindex Status
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | Active, Implemented |
|
||||
| **URL** | `GET http://{host}:18810/reindex/status` |
|
||||
| **Authentication** | None |
|
||||
|
||||
**Purpose:** Returns the current reindex status including file hashes.
|
||||
|
||||
#### Response Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"files": {
|
||||
"path/to/file.md": "sha256-hash"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Gap Analysis: Endpoints That Need Implementation
|
||||
|
||||
The following endpoints are defined in the action bus architecture but do NOT yet have dedicated webhook-triggered workflows. Some are partially covered by Action Bus actions.
|
||||
|
||||
### 1. `process_url` - Capture and Summarize a URL
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | COVERED by `Web-to-Notes Capture` endpoint AND Action Bus `fetch_and_normalize_url` |
|
||||
| **Gap** | No dedicated `/webhook/process-url` endpoint exists, but the functionality is fully available via `/webhook/web-to-notes` |
|
||||
|
||||
**Recommendation:** Rename `web-to-notes` to `process-url` or add an alias. The current `web-to-notes` endpoint already does URL capture + LLM summary + Obsidian save.
|
||||
|
||||
### 2. `summarize_pdf` - Extract and Summarize a PDF
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | NOT IMPLEMENTED |
|
||||
| **Gap** | No workflow exists to accept a PDF URL, extract text, and summarize it |
|
||||
|
||||
**Required Implementation:**
|
||||
- New workflow: `POST /webhook/summarize-pdf`
|
||||
- Request: `{ "pdf_url": "string (required)", "title?": "string", "tags?": "string[]" }`
|
||||
- Needs a PDF text extraction service (e.g., `pdftotext`, `pymupdf`, or an HTTP microservice)
|
||||
- Then summarize with local LLM and save to Obsidian
|
||||
|
||||
### 3. `add_reminder` - Add a Reminder/Task
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | PARTIALLY IMPLEMENTED |
|
||||
| **Gap** | `POST /webhook/openclaw-reminder` sends an immediate notification but does NOT persist the reminder. Action Bus `tasks_add` adds a Google Task but has no webhook-specific endpoint |
|
||||
|
||||
**Current Coverage:**
|
||||
- `POST /webhook/openclaw-reminder` - immediate Telegram + Discord notification (no persistence)
|
||||
- Action Bus `tasks_add` - adds to Google Tasks (persistent)
|
||||
- Action Bus `create_calendar_event` - creates calendar events with reminders
|
||||
|
||||
**Recommendation:** Consider whether a unified `/webhook/add-reminder` endpoint should both persist AND notify.
|
||||
|
||||
### 4. `sync_vault` - Trigger Obsidian Vault Sync/Reindex
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | COVERED by host endpoint `POST :18810/reindex` |
|
||||
| **Gap** | No n8n webhook exposes this; it's available as a direct host-side HTTP endpoint only. Also covered by the scheduled `Obsidian Vault Reindex` workflow (every 6 hours) |
|
||||
|
||||
**Recommendation:** Either expose via Action Bus as a new action, or document that callers should use `POST http://172.19.0.1:18810/reindex` directly (host-side only, not externally accessible). To make it externally accessible, add a `sync_vault` action to the Action Bus.
|
||||
|
||||
### 5. `run_health_check` - Trigger a Health Check of the Swarm
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | PARTIALLY IMPLEMENTED |
|
||||
| **Gap** | `GET :18809/health` returns container health but is host-side only. The `Swarm Health Watchdog` workflow (ID: `lDKocSFXBQWQrDd3`) runs on schedule but has no webhook trigger. No unified webhook endpoint for on-demand health checks |
|
||||
|
||||
**Recommendation:** Add a `run_health_check` action to the Action Bus, or add a webhook trigger to the Swarm Health Watchdog workflow.
|
||||
|
||||
### 6. `process_voice_memo` - Process a Voice Memo
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | FULLY IMPLEMENTED as `POST /webhook/voice-memo` |
|
||||
| **Gap** | None. This endpoint is fully operational |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Status Summary
|
||||
|
||||
| Endpoint | Path | Method | Implemented | Workflow ID |
|
||||
|----------|------|--------|-------------|-------------|
|
||||
| OpenClaw Action Bus | `/webhook/openclaw-action` | POST | Yes | `Jwi54VWMdlLqYnRo` |
|
||||
| Reminder Notification | `/webhook/openclaw-reminder` | POST | Yes | `RUR1CGn0ikkxbPin` |
|
||||
| Web-to-Notes Capture | `/webhook/web-to-notes` | POST | Yes | `GSmzuA5dgGgyRg5v` |
|
||||
| Voice Memo Capture | `/webhook/voice-memo` | POST | Yes | `El1BHJZ56JlzhrRZ` |
|
||||
| Docker Container Health | `:18809/health` | GET | Yes | (host-side) |
|
||||
| Obsidian Reindex | `:18810/reindex` | POST | Yes | (host-side) |
|
||||
| Obsidian Reindex Health | `:18810/healthz` | GET | Yes | (host-side) |
|
||||
| Obsidian Reindex Status | `:18810/reindex/status` | GET | Yes | (host-side) |
|
||||
| **Summarize PDF** | `/webhook/summarize-pdf` | POST | **No** | - |
|
||||
| **Health Check (webhook)** | via Action Bus | POST | **No** | - |
|
||||
| **Vault Sync (webhook)** | via Action Bus | POST | **No** | - |
|
||||
|
||||
### Action Bus Sub-actions Status
|
||||
|
||||
The OpenClaw Action Bus already implements these actions internally:
|
||||
- Email: `send_email_draft`, `list_email_drafts`, `delete_email_draft`, `send_gmail_draft`, `send_approved_email`
|
||||
- Calendar: `create_calendar_event`, `list_upcoming_events`, `update_calendar_event`, `delete_calendar_event`
|
||||
- Tasks: `tasks_add`, `tasks_list`, `tasks_done`, `tasks_delete`
|
||||
- Drive: `drive_search`, `drive_upload`, `drive_download`
|
||||
- Docs: `docs_list`, `docs_read`, `docs_create`, `docs_write`, `docs_export`
|
||||
- Notifications: `notify`, `send_notification_draft`
|
||||
- Approvals: `approval_queue_add`, `approval_queue_list`, `approval_queue_resolve`, `approval_history_attach_execution`
|
||||
- Utility: `fetch_and_normalize_url`, `append_log`, `get_logs`, `inbound_event_filter`
|
||||
|
||||
---
|
||||
|
||||
## Network Reference
|
||||
|
||||
| From | To | Address |
|
||||
|------|----|---------|
|
||||
| External/Host | n8n Webhooks | `http://127.0.0.1:18808/webhook/{path}` |
|
||||
| n8n (Docker) | Host services | `http://172.19.0.1:{port}/{path}` |
|
||||
| n8n (Docker) | n8n internal | `http://127.0.0.1:5678/api/v1/` |
|
||||
| n8n (Docker) | Obsidian REST | `http://172.19.0.1:27123/vault/{path}` |
|
||||
| n8n (Docker) | Local LLM | `http://172.19.0.1:18806/v1/` |
|
||||
| n8n (Docker) | Whisper | `http://172.19.0.1:18811/v1/audio/transcriptions` |
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
Two authentication patterns are used:
|
||||
|
||||
1. **Header Auth** (`x-openclaw-secret`): Used by Action Bus and Reminder webhooks. The secret is stored in n8n credential `OpenClaw Webhook Header` (ID: `6sZd8ciia1fsItDd`) and referenced in `~/lab/swarm/openclaw/credentials/n8n.env` as `N8N_WEBHOOK_SECRET`.
|
||||
|
||||
2. **No Auth**: Used by Web-to-Notes and Voice Memo webhooks. These are open endpoints (consider adding auth if exposed publicly).
|
||||
|
||||
---
|
||||
|
||||
## Related
|
||||
|
||||
- [[Infrastructure/Automation/n8n Workflows]] - Full workflow documentation
|
||||
- [[Infrastructure/Architecture]] - Overall system architecture
|
||||
- [[Infrastructure/Services/Docker Services]] - Docker service registry
|
||||
- [[Infrastructure/Automation/Cron Jobs]] - Scheduled task documentation
|
||||
+26
-23
@@ -164,16 +164,16 @@ Last verified on 2026-05-13 (evening):
|
||||
- Status: active
|
||||
- Type: webhook
|
||||
- Current behavior:
|
||||
- Accepts an audio URL.
|
||||
- Downloads audio.
|
||||
- Transcribes with local Whisper on `18811`.
|
||||
- Summarizes with local llama.cpp.
|
||||
- Writes transcript/summary/action items to Obsidian.
|
||||
- Sends a Telegram notification.
|
||||
- Accepts three ingress modes: `audio_url`, `telegram_file_id`, or `discord_audio_url`.
|
||||
- Host-side processor on port `18813` (`voice-memo-processor.py`) handles download, Whisper transcription, and local LLM summarization.
|
||||
- Optional Kokoro TTS read-back of summary (`include_tts: true`).
|
||||
- Writes transcript/summary to Obsidian with YAML frontmatter including `source_type`.
|
||||
- Sends Telegram notification with source type and optional TTS audio link.
|
||||
- Host-side service: `~/lab/swarm/scripts/voice-memo-processor.py` on port `18813`.
|
||||
- Systemd user service: `voice-memo-processor.service` (enabled).
|
||||
- Remaining improvement:
|
||||
- Add native Telegram/Discord voice-message ingress instead of requiring an audio URL.
|
||||
- Add optional Kokoro read-back of summary.
|
||||
- Add durable action-item routing to notes/task queue.
|
||||
- Test end-to-end with real Telegram voice messages.
|
||||
|
||||
### Web-to-Notes Capture
|
||||
|
||||
@@ -182,14 +182,14 @@ Last verified on 2026-05-13 (evening):
|
||||
- Type: webhook
|
||||
- Current behavior:
|
||||
- Accepts a URL.
|
||||
- Fetches the page.
|
||||
- Extracts readable text.
|
||||
- Host-side content extractor on port `18812` (`url-content-extractor.py`) classifies and extracts content.
|
||||
- Supports YouTube (transcript via `youtube-transcript-api`), PDF (text via `pymupdf`), and web (readable text via `readability-lxml`).
|
||||
- Summarizes with local llama.cpp.
|
||||
- Writes markdown to Obsidian.
|
||||
- Writes markdown to Obsidian with YAML frontmatter including `content_type`, `source_url`, `title`, `date`, and tags.
|
||||
- Host-side service: `~/lab/swarm/scripts/url-content-extractor.py` on port `18812`.
|
||||
- Systemd user service: `url-content-extractor.service` (enabled).
|
||||
- Remaining improvement:
|
||||
- Add YouTube transcript handling.
|
||||
- Add PDF handling.
|
||||
- Add claim extraction and source metadata.
|
||||
- Add claim extraction and source metadata enrichment.
|
||||
- Add optional Atlas/Hermes higher-quality synthesis for important captures.
|
||||
|
||||
### OpenClaw Action Bus / Reminder Webhook
|
||||
@@ -328,17 +328,14 @@ Recommended implementation:
|
||||
6. ~~Fix stale container URLs in IMAP workflow.~~ Done 2026-05-13.
|
||||
7. ~~Implement Obsidian Semantic Index.~~ Done 2026-05-13: ChromaDB `obsidian` collection, Ollama nomic-embed-text, automated reindex every 6h.
|
||||
|
||||
8. Upgrade Web-to-Notes Capture.
|
||||
- Add PDF and YouTube transcript support.
|
||||
- Add source metadata and claim extraction.
|
||||
8. ~~Upgrade Web-to-Notes Capture.~~ Done 2026-05-13: host-side content extractor on :18812, supports YouTube/PDF/web, workflow updated.
|
||||
- Remaining: claim extraction, Atlas/Hermes synthesis.
|
||||
|
||||
9. Upgrade Voice Memo Pipeline.
|
||||
- Add native Telegram/Discord voice ingestion.
|
||||
- Add optional Kokoro audio summary.
|
||||
9. ~~Upgrade Voice Memo Pipeline.~~ Done 2026-05-13: host-side processor on :18813, Telegram/Discord voice ingress, Kokoro TTS read-back.
|
||||
- Remaining: test with real Telegram voice, action-item routing.
|
||||
|
||||
10. Define webhook action bus catalog.
|
||||
- Document stable endpoints and schemas.
|
||||
- Add `process_url`, `summarize_pdf`, `add_reminder`, `sync_vault`, `run_health_check`.
|
||||
10. ~~Define webhook action bus catalog.~~ Done 2026-05-13: catalog at `Infrastructure/Automation/Webhook Action Bus.md`.
|
||||
- Remaining: implement `summarize_pdf`, `sync_vault`, `run_health_check` webhook wrappers.
|
||||
|
||||
## Verification commands
|
||||
|
||||
@@ -360,6 +357,12 @@ curl -fsS --max-time 3 http://127.0.0.1:18809/health | python3 -m json.tool
|
||||
curl -fsS http://127.0.0.1:18810/healthz
|
||||
curl -fsS http://127.0.0.1:18810/reindex/status | python3 -m json.tool
|
||||
|
||||
# URL content extractor (Web-to-Notes)
|
||||
curl -fsS http://127.0.0.1:18812/healthz
|
||||
|
||||
# Voice memo processor
|
||||
curl -fsS http://127.0.0.1:18813/healthz
|
||||
|
||||
# Verify from inside n8n container
|
||||
docker exec n8n-agent wget -qO- http://172.19.0.1:18809/health
|
||||
docker exec n8n-agent wget -qO- http://172.19.0.1:18810/healthz
|
||||
|
||||
Reference in New Issue
Block a user