Resume (PDF) · Notes
Resources: Speech Timeline · Audio J-Lens Explorers: ASR · Speech-to-Speech · Open Audio Judge Demo: TTS · ASR · Audio Benchmark Index
Courses: Audio ML Course · LLM Course

About

Not until we are lost do we begin to understand ourselves.

Speech-focused ML engineer in the San Francisco Bay Area working on production-oriented ASR/TTS systems, speech LLM evaluation pipelines, and automated model-quality workflows. I have built across streaming ASR, TTS, full-duplex speech-to-speech systems, LLM-as-judge evaluation, RAG, keyword and hotword boosting, and neural speech interfaces.

I am targeting Applied AI, Speech LLM, and Audio ML roles focused on model training, evaluation systems, ASR/TTS workflows, LLM-as-judge systems, speech agents, multimodal model quality, and production data pipelines. TN Visa eligible.

Recent Focus

LLM-as-judge for voice AI: led automated evaluation workflows that reduced evaluation cost by 10x with private models, 1000x with open-source models, and turnaround time by roughly 100x, from days to under 1 hour.
Full-duplex speech LLM quality: designed metrics for multi-turn full-duplex speech LLMs across 31 locales, improving quality tracking, iteration speed, cost efficiency, and production readiness.
Production speech systems: shipped and evaluated ASR/TTS workflows across wearable assistants, restaurant voice AI, contact center ASR, and BCI speech interfaces.

Experience

Linguistic Engineer, Speech LLM / ASR-TTS Evaluation @ Meta, Reality Labs Wearables (Jan 2025 - Present)

Improved on-device and server-side ASR/TTS for voice assistants across wearable devices and Meta AI by leading data preparation and evaluation workflows for expressive speech and Llama 4 full-duplex speech LLMs.
Designed metrics for multi-turn full-duplex speech LLMs across 31 locales, improving cost efficiency, quality tracking, iteration speed, and production readiness.
Led the transition from manual human evaluation to automated LLM-as-judge workflows, reducing evaluation cost by 10x with private models and 1000x with open-source models.
Benchmarked internal models and leading external LLMs across multiple dimensions of audio-modality performance.
Deployed chat-based agentic workflows to automate routine research and engineering support tasks, reduce repeated support requests, and enable self-serve access.
Prototyped evaluation-driven feedback loops for autonomous agent self-improvement and workflow optimization.

Founder & CEO @ InnerSpeech Canada / Hong Kong (Sep 2023 - Jan 2025)

Founded a non-invasive BCI speech startup focused on imagined and inner speech decoding using EEG and multimodal biosignals, with assistive communication as the initial target use case.
Owned product framing, technical roadmap, partnership conversations, grant and startup programs, and prototype development.
Built and open-sourced the InnerSpeech biosignal speech recognition and synthesis toolkit covering EEG, EMG, HD-EEG, MEG, fNIRS, and invasive neural speech datasets.
Developed Brain-to-Text Benchmark 2024 systems using RNN-Transformer modeling with language-model rescoring.
Secured founder and startup support including HKSTP Ideation, HK Tech 300 seed approval, NVIDIA Inception, AWS Activate, Google for Startups, Microsoft Founder Hub, and Communitech Founder Program.

Machine Learning Engineer @ Kea Cloud Inc. (Apr 2024 - Jul 2024)

Improved restaurant-ordering ASR through API integration, contextual biasing, and menu/entity recognition evaluation.
Built LLM-augmented ordering-agent workflows using RAG-style ASR-to-NLU pipelines.
Evaluated ASR boosting and keyword strategies for restaurant-specific entities, menu items, and conversational order flows.

Speech Recognition Engineer III @ Dialpad Canada Inc. (Feb 2020 - Aug 2023)

Built and improved production ASR systems using Kaldi, NeMo, and K2 across hybrid and end-to-end speech recognition stacks.
Led ASR model updates and evaluation cycles, including analysis, data preparation, and deployment-oriented model comparisons.
Developed and shipped contextual biasing methods, including n-gram and lattice boosting, to improve recognition of customer-specific vocabulary and recover failing business-call scenarios.
Published and presented work on ASR boosting and G2P modeling through SANE 2022 and SIGMORPHON 2021.
Co-supervised Master’s thesis projects at the University of Edinburgh and The University of British Columbia.

NLP Consultant @ The Hong Kong Polytechnic University (Sep 2020 - May 2023, part-time)

Advised on grammar error correction and RAG-style workflows, including retrieval-augmented model design and evaluation.
Integrated rule-based and neural network-based methodologies to enhance system efficiency and performance.
Investigated and implemented quantization, data augmentation, and model optimization to further elevate the system’s capabilities.

Projects & Certifications

Open Audio Judge (2026 - Present)

Built an open-source LLM-as-judge evaluation and monitoring toolkit for voice-AI systems, using omni-model APIs and self-hosted models to assess ASR, TTS, and speech-agent outputs.

Audio Benchmark Index (2026 - Present)

Curated a public index of speech, audio, and multimodal audio benchmarks with task coverage, license/access notes, official sources, and reproducible download helpers.

Speech ML Systems Curriculum (2025 - Present)

Created and maintained an open-source course covering audio representations, ASR/TTS foundations, modern speech agents, evaluation, serving, safety, and production reliability.

NLP Consultant - UsherGPT (2023 - 2024)

Guided development of UsherGPT, tailored for public health and medical data applications using Retrieval-Augmented Generation techniques at the University of Edinburgh Usher Institute.

Third Prize in MUCS 2021 (2021)

Contributed to multilingual and low-resource ASR for Indian languages, benchmarking and open-sourcing end-to-end methods.

Education

Master of Science in Speech & Language Processing @ University of Edinburgh (2018 - 2019)

Thesis: Robust Word Recognition and Alignment of Child Speech Therapy Sessions using Audio and Ultrasound Imaging (PyTorch and Kaldi).

Coursework: Speech Synthesis, Automatic Speech Recognition, Natural Language Understanding, Generation and Machine Translation, Reinforcement Learning, and Neural Information Processing.

The Edinburgh Award (Enterprise)
First place winner of the Business Ideas Competition by The University of Edinburgh
Winner of Scottish Institute for Enterprise’s Fresh Ideas Competition

Bachelor of Arts in Linguistics and Language Applications, First Class Honours @ City University of Hong Kong (2014 - 2018)

Major: Linguistics and Language Applications. Minor: Translations.

Thesis: A Comparative Study of Interlingual vs. Neural Approach to Machine Translation of Numerical Expressions (TensorFlow and Java).

Core Skills

Applied AI & systems: Python, PyTorch, Docker, data pipelines, production debugging, evaluation automation, LLM-as-judge, internal tooling, LLM/RAG workflows, agentic development tooling.
Speech ML: ASR, TTS, speech data preparation, ASR/TTS evaluation, hotword and keyword boosting, WER/MOS/CMOS analysis, streaming ASR, speech-to-speech workflows.
Modeling & frameworks: Kaldi, NVIDIA NeMo, K2, CTC, Transducer/RNN-T, Conformer, Thinker-Talker, n-gram rescoring.
Research areas: speech recognition, speech synthesis, NLP, non-invasive BCI, EEG/fNIRS speech decoding, multimodal speech interfaces.

Publications

N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets. Poster at SANE 2022.
Avengers, Ensemble! Benefits of ensembling in grapheme-to-phoneme prediction. Paper at the 18th SIGMORPHON Workshop, 2021.
Can linguistics help neural machine translation? Evidence from a case study of interlingual vs. neural machine translation of numerical expressions. Presentation at AI and Linguistics Conference 2018.