Tag - Statistics

Huy Tuan Pham: Structures in Random Graphs: New Connections

In 1947, Paul Erdos famously gave a construction of graphs with no large cliques and independent sets via the probabilistic method. Since then, Erdos’ ingenious probabilistic insight and the emergence of structures in random graphs have motivated fundamental developments in combinatorics and probability. In this talk, I highlight several recent results in random graph studies with interesting connections to additive combinatorics, theoretical computer science and probability.

In joint work with David Conlon, Jacob Fox and Liana Yepremyan, we study structures in random Cayley graphs of general groups. Given a fixed finite group, random Cayley graphs are constructed by choosing the generating set at random. These graphs thus reflect interesting symmetries and properties of the group, at the cost of inducing complex dependencies. We will discuss results on clique and independence numbers in random Cayley graphs of general groups, as well as progress towards a conjecture of Alon on Cayley graphs with small clique and independence number. These questions are naturally connected with some fundamental problems in additive combinatorics, where we surprisingly discover that in some of them the group structure is superfluous.

Certain important aspects in the study of structures in random graphs can be phrased in terms of thresholds. In joint work with Jinyoung Park, building on insights in our resolution of Talagrand’s selector process conjecture, we prove the Kahn-Kalai conjecture, that thresholds of general monotone properties are closely predicted by expectation obstructions. The Kahn-Kalai conjecture is a beautiful milestone towards the understanding of emergence of general structures, and yet to complete the quest, it remains to study these expectation obstructions. This latter task can prove to be highly challenging in several cases and bring in interesting connections. As an illustration, I will discuss joint work with Vishesh Jain that determines the threshold for edge-colorings of complete bipartite graphs by exploiting connections to the Lovasz Local Lemma and local uniformity properties, which have played an important role in sampling solutions to constraint satisfaction problems.

Rafał Kulik: Blocks estimators in Extreme Value Theory

Extreme value theory deals with large values and rare events. These large values tend to cluster in case of temporal dependence. This clustering behaviour is widely observed in practice. I will start with a mild introduction to extreme value theory, discussing probabilistic and statistical issues. This part will be accessible to a broader audience.

Then, I will talk about a more specific problem of statistical theory for cluster functionals and rare events. Two types of estimators are of a primary importance: disjoint and sliding blocks estimators. It has been conjectured that sliding blocks estimators are “better” (to be made precise in the talk). We proved in a recent series of papers that this is not the case and in fact both disjoint and sliding blocks estimators are asymptotically equivalent. This part will be aimed at probabilistic and statisticians.

I will conclude with recent directions in extreme value theory, such as extremes in high dimension, extremes of graphs and networks.

Jana de Wiljes: Sequential Bayesian Learning

In various application areas it is crucial to make predictions or decisions based on sequentially incoming observations and previous existing knowledge on the system of interest. The prior knowledge is often given in the form of evolution equations (e.g., ODEs derived via first principles or fitted based on previously collected data), from here on referred to as model. Despite the available observation and prior model information, accurate predictions of the 'true' reference dynamics can be very difficult.

Common reasons that make this problem so challenging are: (i) the underlying system is extremely complex (e.g., highly non-linear) and chaotic (i.e., crucially dependent on the initial conditions), (ii) the associate state and/or parameter space is very high-dimensional (e.g., worst case 108) (iii) Observations are noisy, partial in space and discrete in time.

In practice these obstacles are combated with a series of approximations (the most important ones being based on assuming Gaussian densities and using Monte Carlo type estimations) and numerical tools that work surprisingly well in some settings. Yet the mathematical understanding of the signal tracking ability of a lot of these methods is still lacking. Additionally, solutions of some of the more complicated problems that require simultaneous state and parameter estimation (including control parameters that can be understood as decisions/actions performed) can still not be approximated in a computationally feasible fashion. Here we will try to address the first layer of these issues step by step and discuss the next advances that need to be made in these many layered problems. More specifically a stability and accuracy analysis of a family of the most popular sequential data assimilation methods typically used in practice.

Common reasons that make this problem so challenging are: (i) the underlying system is extremely complex (e.g., highly non-linear) and chaotic (i.e., crucially dependent on the initial conditions), (ii) the associate state and/or parameter space is very high-dimensional (e.g., worst case 10^8) (iii) Observations are noisy, partial in space and discrete in time.

In practice these obstacles are combated with a series of approximations (the most important ones being based on assuming Gaussian densities and using Monte Carlo type estimations) and numerical tools that work surprisingly well in some settings. Yet the mathematical understanding of the signal tracking ability of a lot of these methods is still lacking. Additionally, solutions of some of the more complicated problems that require simultaneous state and parameter estimation (including control parameters that can be understood as decisions/actions performed) can still not be approximated in a computationally feasible fashion. Here we will try to address the first layer of these issues step by step and discuss the next advances that need to be made in these many layered problems. More specifically a stability and accuracy analysis of a family of the most popular sequential data assimilation methods typically used in practice.

Clara Grazian: Finding structures in observations: consistent(?) clustering analysis

Clustering is an important task in almost every area of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others.

Methodologies to perform inference on the number of clusters have often been proved to be inconsistent and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering in the situation where the number of clusters is unknown is often performed by using Dirichlet process priors or finite mixture models. However, the posterior distributions on the number of groups have been recently proved to be inconsistent.

This seminar aims at reviewing the Bayesian approaches available to perform via mixture models and give some new insights.

Sanjeev Arora: Toward Theoretical Understanding of Deep Learning

The empirical success of deep learning drives much of the excitement about machine learning today. This success vastly outstrips our mathematical understanding. This lecture surveys progress in recent years toward developing a theory of deep learning. Works have started addressing issues such as speed of optimization, sample requirements for training, effect of architecture choices, and properties of deep generative models.

Sanjeev Arora: What is Machine Learning?

Machine learning is the sub-field of computer science concerned with creating programs and machines that can improve from experience and interaction. It relies upon mathematical optimization, statistics, and algorithm design. The talk will be an introduction to machine learning for a mathematical audience. We describe the mathematical formulations of basic types of learning such as supervised, unsupervised, interactive, etc., and the philosophical and scientific issues raised by them.

Ilya Molchanov: Set-valued risk measures of non-convex portfolios

Non-convex random sets of admissible positions naturally arise in the setting of fixed transaction costs or when only a finite range of possible transactions is considered. The talk defines set-valued risk measures in such cases and explores the situations when they return convex result, namely, when Lyapunov’s theorem applies. The case of fixed transaction costs is analysed in greater details.

Daniel Balint: Discounting invariant FTAP for large financial markets

For large financial markets as introduced in Kramkov and Kabanov 94, there are several existing absence-of-arbitrage conditions in the literature. They all have in common that they depend in a crucial way on the discounting factor. We introduce a new concept, generalizing NAA1 (K&K 94) and NAA (Rokhlin 08), which is invariant with respect to discounting. We derive a dual characterization by a contiguity property (FTAP).We investigate connections to the in finite time horizon framework (as for example in Karatzas and Kardaras 07) and illustrate negative result by counterexamples.