Resume (PDF) · Audio ML Course · Notes
About
Not until we are lost do we begin to understand ourselves.
Speech-focused ML engineer in the San Francisco Bay Area working on production-oriented ASR/TTS systems, speech LLM evaluation pipelines, and automated model-quality workflows. I have built across streaming ASR, TTS, full-duplex speech-to-speech systems, LLM-as-judge evaluation, RAG, keyword and hotword boosting, and neural speech interfaces.
I am targeting Applied AI, Speech LLM, and Audio ML roles focused on model training, evaluation systems, ASR/TTS workflows, LLM-as-judge systems, speech agents, multimodal model quality, and production data pipelines. TN Visa eligible.
Recent Focus
- LLM-as-judge for voice AI: led automated evaluation workflows that reduced evaluation cost by 10x with private models, 1000x with open-source models, and turnaround time by roughly 100x, from days to under 1 hour.
- Full-duplex speech LLM quality: designed metrics for multi-turn full-duplex speech LLMs across 31 locales, improving quality tracking, iteration speed, cost efficiency, and production readiness.
- Production speech systems: shipped and evaluated ASR/TTS workflows across wearable assistants, restaurant voice AI, contact center ASR, and BCI speech interfaces.
Experience
- Improved on-device and server-side ASR/TTS for voice assistants across wearable devices and Meta AI by leading data preparation and evaluation workflows for expressive speech and Llama 4 full-duplex speech LLMs.
- Designed metrics for multi-turn full-duplex speech LLMs across 31 locales, improving cost efficiency, quality tracking, iteration speed, and production readiness.
- Led the transition from manual human evaluation to automated LLM-as-judge workflows, reducing evaluation cost by 10x with private models and 1000x with open-source models.
- Benchmarked internal models and leading external LLMs across multiple dimensions of audio-modality performance.
- Deployed chat-based agentic workflows to automate routine research and engineering support tasks, reduce repeated support requests, and enable self-serve access.
- Prototyped evaluation-driven feedback loops for autonomous agent self-improvement and workflow optimization.
Founder & CEO @ InnerSpeech Canada / Hong Kong (Sep 2023 - Jan 2025)
- Founded a non-invasive BCI speech startup focused on imagined and inner speech decoding using EEG and multimodal biosignals, with assistive communication as the initial target use case.
- Owned product framing, technical roadmap, partnership conversations, grant and startup programs, and prototype development.
- Built and open-sourced the InnerSpeech biosignal speech recognition and synthesis toolkit covering EEG, EMG, HD-EEG, MEG, fNIRS, and invasive neural speech datasets.
- Developed Brain-to-Text Benchmark 2024 systems using RNN-Transformer modeling with language-model rescoring.
- Secured founder and startup support including HKSTP Ideation, HK Tech 300 seed approval, NVIDIA Inception, AWS Activate, Google for Startups, Microsoft Founder Hub, and Communitech Founder Program.
Machine Learning Engineer @ Kea Cloud Inc. (Apr 2024 - Jul 2024)
- Improved restaurant-ordering ASR through API integration, contextual biasing, and menu/entity recognition evaluation.
- Built LLM-augmented ordering-agent workflows using RAG-style ASR-to-NLU pipelines.
- Evaluated ASR boosting and keyword strategies for restaurant-specific entities, menu items, and conversational order flows.
Speech Recognition Engineer III @ Dialpad Canada Inc. (Feb 2020 - Aug 2023)
- Built and improved production ASR systems using Kaldi, NeMo, and K2 across hybrid and end-to-end speech recognition stacks.
- Led ASR model updates and evaluation cycles, including analysis, data preparation, and deployment-oriented model comparisons.
- Developed and shipped contextual biasing methods, including n-gram and lattice boosting, to improve recognition of customer-specific vocabulary and recover failing business-call scenarios.
- Published and presented work on ASR boosting and G2P modeling through SANE 2022 and SIGMORPHON 2021.
- Co-supervised Master’s thesis projects at the University of Edinburgh and The University of British Columbia.
NLP Consultant @ The Hong Kong Polytechnic University (Sep 2020 - May 2023, part-time)
- Advised on grammar error correction and RAG-style workflows, including retrieval-augmented model design and evaluation.
- Integrated rule-based and neural network-based methodologies to enhance system efficiency and performance.
- Investigated and implemented quantization, data augmentation, and model optimization to further elevate the system’s capabilities.
Projects & Certifications
- Built an open-source LLM-as-judge evaluation and monitoring toolkit for voice-AI systems, using omni-model APIs and self-hosted models to assess ASR, TTS, and speech-agent outputs.
- Created and maintained an open-source course covering audio representations, ASR/TTS foundations, modern speech agents, evaluation, serving, safety, and production reliability.
NLP Consultant - UsherGPT (2023 - 2024)
- Guided development of UsherGPT, tailored for public health and medical data applications using Retrieval-Augmented Generation techniques at the University of Edinburgh Usher Institute.
- Contributed to multilingual and low-resource ASR for Indian languages, benchmarking and open-sourcing end-to-end methods.
Education
Master of Science in Speech & Language Processing @ University of Edinburgh (2018 - 2019)
Thesis: Robust Word Recognition and Alignment of Child Speech Therapy Sessions using Audio and Ultrasound Imaging (PyTorch and Kaldi).
Coursework: Speech Synthesis, Automatic Speech Recognition, Natural Language Understanding, Generation and Machine Translation, Reinforcement Learning, and Neural Information Processing.
- The Edinburgh Award (Enterprise)
- First place winner of the Business Ideas Competition by The University of Edinburgh
- Winner of Scottish Institute for Enterprise’s Fresh Ideas Competition
Bachelor of Arts in Linguistics and Language Applications, First Class Honours @ City University of Hong Kong (2014 - 2018)
Major: Linguistics and Language Applications. Minor: Translations.
Thesis: A Comparative Study of Interlingual vs. Neural Approach to Machine Translation of Numerical Expressions (TensorFlow and Java).
Core Skills
- Applied AI & systems: Python, PyTorch, Docker, data pipelines, production debugging, evaluation automation, LLM-as-judge, internal tooling, LLM/RAG workflows, agentic development tooling.
- Speech ML: ASR, TTS, speech data preparation, ASR/TTS evaluation, hotword and keyword boosting, WER/MOS/CMOS analysis, streaming ASR, speech-to-speech workflows.
- Modeling & frameworks: Kaldi, NVIDIA NeMo, K2, CTC, Transducer/RNN-T, Conformer, Thinker-Talker, n-gram rescoring.
- Research areas: speech recognition, speech synthesis, NLP, non-invasive BCI, EEG/fNIRS speech decoding, multimodal speech interfaces.
Publications