Drill 1: Streaming ASR Partial Latency Doubles
A streaming ASR release keeps final WER flat but first partial p95 moves from 420 ms to 930 ms for mobile users on noisy calls.
Hidden answer: strong triage outline
Slice by model version, device, codec, language, VAD state, chunk size, region, and connection type. Check VAD false silence, audio chunk buffering, batching policy, GPU queue depth, and client upload cadence. Mitigate with canary rollback or reduced batch delay on interactive traffic. Add a release gate for first partial latency and partial churn by noisy/mobile slices, not only final WER.