Prompt 1: Design A Shared Speech Serving Platform
Design a platform that hosts streaming ASR, TTS, and speech-to-speech
for multiple product teams. Cover APIs, tenancy, scheduling,
observability, release gates, privacy, and cost controls.
Hidden answer: strong outline
Discuss separate real-time and batch pools, tenant quotas, model
registry, feature extraction contracts, streaming APIs, cancellation,
canary routing, slice metrics, privacy-safe telemetry, incident
runbooks, model rollback, autoscaling, GPU utilization, and chargeback
or showback. Call out that ASR, TTS, and LLM serving have different
bottlenecks.
Prompt 2: Debug A Speech-To-Speech Latency Regression
p95 turn latency increased by 35 percent after a release. ASR, LLM,
and TTS each claim their local metrics are healthy.
Hidden answer: strong outline
Build an end-to-end trace waterfall and inspect queue time, network,
client buffering, orchestration gaps, retries, retrieval latency,
context growth, TTS first-audio-byte, playback start, and cancellation.
Local component health is not enough when the product SLO is a full
spoken turn.
Prompt 3: Choose Local, Cloud, Or Hybrid Inference
A privacy-sensitive assistant needs wake word, short commands, long
dictation, and high-quality TTS. Decide what runs locally versus in
the cloud.
Hidden answer: strong outline
Run wake word, VAD, privacy filters, and simple commands locally
when possible. Use cloud or dedicated servers for long dictation,
large LLM reasoning, heavy retrieval, and high-quality TTS if
consent and policy allow. Add offline fallback, explicit upload
boundaries, telemetry minimization, and model/version compatibility
checks.