Michael Min Wah Leung

Notes on post-training, sequence modelling, and the occasional brain. Code-first, results-honest, written by an ML engineer with a research background in neuroscience.

Writing

Why SFT learned the words but GRPO learned the rules

post-training

GRPO

RLHF

LLMs

Teaching a 14B model a proprietary equipment-naming taxonomy with a hand-tuned reward function, and why ~250 lines of reward code and a quarter-epoch of GRPO closed the gap that more SFT couldn’t.

From consuming a pretrained model to training my own

seq2seq

sign-language

MLX

hybrid-inference

Building a continuous-sign-language Copilot: a Transformer Seq2Seq trained from scratch on How2Sign, two training backends, and a hybrid runtime that reaches 93.6% sentence-level recognition.

Patient-specific filters as biomarkers

neuroscience

signal-processing

interpretability

ICA, FOOOF, and CSP for noisy EEG, and what spatial filters taught me about feature extraction in transformers.