Release Plan: New Streaming ASR Model
The candidate model improves offline WER by 7 percent relative, but
uses more memory and changes partial transcript behavior. Propose a
production rollout.
Hidden answer: Advanced rollout plan
Start with shadow traffic to compare transcripts, latency, and
partial churn without user exposure. Then canary low-risk tenants,
short calls, and one language before expanding. Gate each step on
WER proxies, entity errors, p95 latency, p99 latency, queue depth,
OOMs, partial churn, cancellation rate, and support tickets. Keep
old workers warm until the canary is stable.
Release Plan: New TTS Voice
A new voice has better preference-test scores but slower first audio
byte. The product goal is conversational response.
Hidden answer: Tradeoff framing
Separate quality-sensitive and latency-sensitive use cases. Canary
on non-critical flows first, stream by sentence, keep warm pools,
cache fixed prompts, and set an explicit first-audio-byte budget.
Roll back or route to the old voice when p95 latency breaks the
conversation SLO, even if offline preference scores look better.