Hyper-v

Mathematics and Science of Large Language Models (Ernest Ryu, UCLA Applied Math Colloquium)



UCLA Applied Math Colloquium, Ernest Ryu, Oct 31, 2024.

Title:
Mathematics and Science of Large Language Models
Abstract:
Large language models (LLMs) represent an engineering marvel, but their inner workings are notoriously challenging to understand. In this talk, we present two analyses of LLMs. The first result is a mathematical guarantee on LoRA fine-tuning for LLMs, showing that the training dynamics almost surely experience no spurious local minima if a LoRA rank $rgtrsimsqrt{N}$ is used, where $N$ is the number of fine-tuning data points. The second result is a scientific analysis of the training dynamics of in-context learning (ICL), showing that training on multiple diverse ICL tasks simultaneously emph{shortens} the loss plateaus, making each task easier to learn.

[ad_2]

source

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button