Case 1: Enterprise Streaming Dictation
Design a streaming ASR service for clinicians. The product needs low partial latency, high named-entity accuracy, tenant isolation, and a rollback path when a model update hurts medical terms.
Hidden answer: strong design outline
Separate real-time streaming from offline correction jobs. Use VAD, normalization, streaming ASR, domain contextual biasing, entity post-processing, and privacy-safe telemetry. Gate releases by WER, entity error rate, partial churn, first partial latency, finalization delay, tenant-level SLOs, and correction rate. Roll back by model version and feature flag, not by redeploying the whole service.