Wang Yau Li

Logo

Not until we are lost do we begin to understand ourselves

GitHub | LinkedIn

[Toolbox: [TTS] [TTS-VC] [ASR] [LLM] [RAG] [Multimodal-LLM]]

[Notes: [TTS] [ASR] [ML] [Others]]

About

I am deeply passionate about leveraging my expertise in Linguistics and Machine Learning to revolutionize Human-Computer Interaction. Aspiring to contribute significantly to the AI industry, I focus on automatic speech recognition (ASR), test-to-speech (TTS), natural language understanding (NLU), and neurotechnology. My commitment lies in enhancing user experiences by developing innovative solutions in these cutting-edge fields.


Experience

Machine Learning Engineer / Consultant @ kea (Apr 2024 - Present)

Founder @ InnerSpeech (Sep 2023 - present)

Speech Recognition Engineer @ Dialpad (Feb 2020 - Aug 2023)

NLP Consultant @ The Hong Kong Polytechnic University (Sep 2020 – May 2023)

Research Assistant @ City University of Hong Kong (Jul 2018 - Sep 2018)


Education

Master’s degree, Speech & Language Processing @ The University of Edinburgh (2018 - 2019)

Thesis: Robust Word Recognition and Alignment of Child Speech Therapy Sessions using Audio and Ultrasound Imaging (with Kaldi and PyTorch)

Bachelor of Arts - BA, Linguistics and Language Applications @ City University of Hong Kong (2014 - 2018)

Thesis: A Comparative Study of Interlingual vs. Neural Approach to Machine Translation of Numerical Expressions (with Java and Tensorflow)

Conference: AI and Linguistics Conference - East China Normal University - Oct. 26-28 2018


Projects

ASR

Data augmentation with GPT-2 and out-of-domain data

Lattice rescoring with LLM

Keyword contextual biasing

Multilingual ASR

ASR with wav2vec2 / whisper / Nemo (Demo)

TTS / Music generation

TTS on CPU (Demo)

NLP

RAG-Transcript: Essence of Presentations (Project)

Grammatical Error Correction with Gector (Demo)

Neuroscience

Brain-to-Text

Biosignal foundation model with EEG-Conformer, GPT, Wav2Vec2 and VQ-VAE (EEG-foundation)


Research