swarm-master

will/swarm-master

Fork 0

Commit Graph

Author	SHA1	Message	Date
William Valentin	50f2640846	feat(whisper): add CUDA Blackwell server, promote to primary on :18801 Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120 so it actually initializes on the RTX 5070 Ti — the upstream ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90. Compose changes: - New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin into the shared volume on first run, fixing the original crash where whisper-server tried to load a model that was never fetched. - New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with identical WER on JFK + 4 TTS samples). - Existing whisper-server (CPU/medium) moves to port 18811 as the fallback for when GPU is unavailable. Container names unchanged so monitoring and volume bindings keep working. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 01:12:58 -07:00

Author

SHA1

Message

Date

William Valentin

50f2640846

feat(whisper): add CUDA Blackwell server, promote to primary on :18801

Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120
so it actually initializes on the RTX 5070 Ti — the upstream
ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90.

Compose changes:
- New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin
  into the shared volume on first run, fixing the original crash where
  whisper-server tried to load a model that was never fetched.
- New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built
  locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the
  benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with
  identical WER on JFK + 4 TTS samples).
- Existing whisper-server (CPU/medium) moves to port 18811 as the fallback
  for when GPU is unavailable. Container names unchanged so monitoring and
  volume bindings keep working.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-30 01:12:58 -07:00

1 Commits