In this presentation, I will provide an overview of how techniques involving self-similar analysis, computer assisted proofs and neural networks can be employed to investigate singularity formation in the context of fluids.
Tag - Machine learning
Starting from a simple deterministic model, we show that the asymptotic outcomes (as time goes to infinity) of both shallow and deep neural networks such as those used in BloombergGPT to generate economic time series are exactly the Nash equilibria of a non-potential game. We then analyse deep neural network algorithms that converge to these equilibria. The approach is extended to federated deep neural networks between clusters of regional servers and on-device clients. Finally, the variational inequalities behind large language models including encoder-decoder related transformers are established.
Transformers can be trained to solve problems of mathematics. I present two recent applications, in mathematics and physics: predicting integer sequences, and discovering the properties of scattering amplitudes in a close relative of quantum chromodynamics. Problems of mathematics can also help understand transformers. Using two examples from linear algebra and integer arithmetic, I show that model predictions can be explained, that trained models do not confabulate, and that carefully choosing the training distributions can help achieve better, and more robust, performance.
Existing research on mechanistic interpretability usually tries to develop an informal human understanding of "how a model works", making it hard to evaluate research results and raising concerns about scalability. Meanwhile formal proofs of model properties seem far out of reach both in theory and practice. In this talk I’ll discuss an alternative strategy for "explaining" a particular behaviour of a given neural network. This notion is much weaker than proving that the network exhibits the behaviour, but may still provide similar safety benefits. This talk will primarily motivate a research direction and a set of theoretical questions rather than presenting results.
Mechanistic Interpretability is a branch of machine learning that takes a trained neural network, and tries to reverse-engineer the algorithms it's learned. First, I'll discuss what we've learned by reverse-engineering tiny models trained to do mathematical operations, eg the algorithm learned to do modular addition. I'll then discuss the phenomena of superposition, where models spontaneously learn to use the geometry of high-dimensional spaces to use compression schemes and represent and compute more features than they have dimensions. Superposition is a major open problem in mechanistic interpretability, and I'll discuss some of the weird mathematical phenomena that come up with superposition, some recent work exploring it, and open problems in the field.
This talk focuses on how rigorous mathematical tools can be used to describe the optimization of large, highly non-convex neural networks. We start by covering how stochastic differential equations (SDEs) provide a rigorous yet flexible model of how deep networks change over the course of training. We then cover how the SDEs yield practical insights into scaling training to highly distributed settings while preserving generalization performance. In the second half of the talk, we will explore the new deep learning paradigm of pre-training and fine-tuning large language models. We show that fine-tuning can be described by a very simplistic mathematical model, and insights allow us to develop a highly efficient and performant optimizer to fine-tune LLMs at scale. The talk will focus on various mathematical tools and the extent to which they can describe modern day deep learning.
Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning enormous neural networks that are too expensive to train more than once. For example, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In this talk, I will explain the core insight behind this theory. In fact, this is an instance of what I call the *Optimal Scaling Thesis*, which connects infinite-size limits for general notions of "size" to the optimal design of large models in practice. I'll end with several concrete key mathematical research questions whose resolutions will have incredible impact on the future of AI.
Here we talk about two things: using LLMs for physics, and how physics can help study LLMs.

You must be logged in to post a comment.