Commit Graph

1 Commits

Author SHA1 Message Date
William Valentin 50f2640846 feat(whisper): add CUDA Blackwell server, promote to primary on :18801
Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120
so it actually initializes on the RTX 5070 Ti — the upstream
ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90.

Compose changes:
- New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin
  into the shared volume on first run, fixing the original crash where
  whisper-server tried to load a model that was never fetched.
- New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built
  locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the
  benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with
  identical WER on JFK + 4 TTS samples).
- Existing whisper-server (CPU/medium) moves to port 18811 as the fallback
  for when GPU is unavailable. Container names unchanged so monitoring and
  volume bindings keep working.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 01:12:58 -07:00