Improved Training Technique for Shortcut Models
An improved version of shortcut model that is much more competitive to other few-step and one-step diffusion models.
Hi! I'm Trung Dao — a 1st-year PhD student at UW-Madison, advised by Prof. Yong Jae Lee. These days I poke at vision-language-action models and world models — basically trying to convince a neural net to look at the world, imagine what happens next, and not bonk into the wall when it finally moves. The dream: agents that perceive, simulate, and actually do useful stuff in the physical world. Before grad school I was a Staff ML Engineer at Qualcomm AI Research (squeezing generative & multimodal models onto phone-sized silicon), and an AI Engineer / Research Resident at VinAI Research with Dr. Anh Tran & Dr. Cuong Pham, mostly hacking on diffusion distillation and GANs. Outside of research: cold brew, naps, and arguing with my cats.
An improved version of shortcut model that is much more competitive to other few-step and one-step diffusion models.
A comprehensive distillation framework for latent flow matching models that excels in generating high-quality and consistent images in both one-step and few-step sampling
{ "email": "tdao6 [at] wisc [dot] edu", "office": "CS @ UW-Madison", "github": "@trungdt880", "scholar": "FZmxEYYAAAAJ", "open_to": ["collabs", "chats about VLA / world models", "book recs"] }