Case 1: ASR Entity Recall Drops After Better WER
A new ASR model improves overall WER by 3 percent relative, but support calls now miss account numbers and medicine names more often. Product asks why the launch gate passed.
Hidden answer: strong diagnosis
WER hid a task-critical entity regression. Compare entity recall, slot edit rate, confidence calibration, escalation rate, and correction events by domain and noise slice. Check normalization, hotword biasing, decoder vocabulary, punctuation, and post-ASR extraction. The fix is not just another aggregate WER gate: add entity-weighted evals, domain fixtures, and rollback triggers for critical slots.