Prompt 1: Size A Streaming ASR Fleet
A contact-center ASR service receives 12,000 concurrent calls at peak. Each call sends 16 kHz mono audio, needs first partial under 450 ms p95, and must tolerate one zone loss. Describe your capacity model and rollout checks.
Hidden answer: strong sizing outline
Start from concurrent streams and audio seconds per second, then split by language, model, region, and real-time priority. Measure max streams per replica at p95 first-partial latency with realistic noise and endpointing. Reserve zone-loss headroom, isolate eval and replay traffic, and autoscale on queue age plus active streams rather than GPU utilization alone. Rollout gates should compare WER/CER slices, partial churn, first partial p95/p99, finalization latency, endpointing errors, and cost per successful audio minute.