The success of deep learning has hinged on learned functions dramatically outperforming hand-designed functions for many tasks. However, we still train models using hand designed optimizers acting on hand designed loss functions. I will argue that these hand designed components are typically mismatched to the desired behavior, and that we can expect meta-learned optimizers to perform much better. I will discuss the challenges and pathologies that make meta- training learned optimizers difficult. These include: chaotic and high variance meta-loss landscapes; extreme computational costs for meta-training; lack of comprehensive meta-training datasets; challenges designing learned optimizers with the right inductive biases; challenges interpreting the method of action of learned optimizers. I will share solutions to some of these challenges. I will show experimental results where learned optimizers outperform hand-designed optimizers in many contexts, and I will discuss novel capabilities that are enabled by meta-training learned optimizers.
Jascha is most (in)famous for inventing diffusion models. His recent work has focused on theory of overparameterized neural networks, meta-training of learned optimizers, and understanding the capabilities of large language models. Jascha will soon start a new job at Anthropic. He has previously worked as a principal scientist in Google Brain and Google DeepMind, as a visiting scholar in Surya Ganguli's lab at Stanford University, and as an academic resident at the Khan Academy education nonprofit. He earned his PhD in 2012 in the Redwood Center for Theoretical Neuroscience at UC Berkeley, in Bruno Olshausen's lab. Prior to his PhD, he worked sending rovers to Mars.