50f2640846
Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120 so it actually initializes on the RTX 5070 Ti — the upstream ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90. Compose changes: - New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin into the shared volume on first run, fixing the original crash where whisper-server tried to load a model that was never fetched. - New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with identical WER on JFK + 4 TTS samples). - Existing whisper-server (CPU/medium) moves to port 18811 as the fallback for when GPU is unavailable. Container names unchanged so monitoring and volume bindings keep working. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>