feat(tools): add docx extraction for minio ingestion

This commit is contained in:
William Valentin
2026-02-16 14:38:01 -08:00
parent e8a785b61f
commit 0548ab3833
6 changed files with 102 additions and 8 deletions
+2 -2
View File
@@ -968,7 +968,7 @@ Upload a local file to MinIO and return a temporary presigned download URL.
#### `minio.ingest`
Read a text-like object from MinIO (and PDFs when `pdftotext` is available) and write it into a memory namespace.
Read a text-like object from MinIO (and PDF/DOCX via local extraction tools when available) and write it into a memory namespace.
```json
{
@@ -1010,7 +1010,7 @@ Read a text-like object from MinIO (and PDFs when `pdftotext` is available) and
#### `minio.sync`
Sync text-like objects from a MinIO prefix into nested memory namespaces (with PDF extraction when available).
Sync text-like objects from a MinIO prefix into nested memory namespaces (with PDF/DOCX extraction when available).
```json
{
+15
View File
@@ -157,6 +157,21 @@
],
"test_status": "pnpm test:run src/tools/builtin/minio-ingest.test.ts src/tools/builtin/minio-sync.test.ts + pnpm typecheck passing"
},
"minio-docx-ingestion-support": {
"status": "completed",
"date": "2026-02-16",
"updated": "2026-02-16",
"summary": "Extended MinIO knowledge ingestion/sync to support DOCX extraction with fallback chain (`pandoc` then `docx2txt`) in both `minio.ingest` and `minio.sync` paths, plus tests/docs updates.",
"files_modified": [
"src/tools/builtin/minio-ingest.ts",
"src/tools/builtin/minio-ingest.test.ts",
"src/tools/builtin/minio-sync.test.ts",
"README.md",
"docs/api/TOOLS.md",
"docs/plans/state.json"
],
"test_status": "pnpm test:run src/tools/builtin/minio-ingest.test.ts src/tools/builtin/minio-sync.test.ts + pnpm typecheck passing"
},
"backup-session-summary-audit-trail": {
"status": "completed",
"date": "2026-02-16",