Online Machine Learning Seminar
This seminar is held online via Zoom. To be added to the mailing list, please email Alex Kasprzyk or Lorenzo De Biase. Everybody is welcome.
Follow us on researchseminars.org to be kept uptodate about upcoming talks. Recordings of talks are available on our YouTube channel.
Note: All times are UK times.
Upcoming Talks

 22 March 2023, 1011am
 Patrizio Frosini (Bologna)
 Some recent results on the theory of GENEOs and its application to Machine Learning
 Group equivariant nonexpansive operators (GENEOs) have been introduced a few years ago as mathematical tools for approximating data observers when data are represented by realvalued or vectorvalued functions. The use of these operators is based on the assumption that the interpretation of data depends on the geometric properties of the observers. In this talk we will illustrate some recent results in the theory of GENEOs, showing how these operators can make available a new approach to topological data analysis and geometric deep learning.
 Link to Event


 5 April 2023, 1011am
 YangHui He (LIMS)
 Universes as Bigdata: Physics, Geometry and MachineLearning
 The search for the Theory of Everything has led to superstring theory, which then led physics, first to algebraic/differential geometry/topology, and then to computational geometry, and now to data science. With a concrete playground of the geometric landscape, accumulated by the collaboration of physicists, mathematicians and computer scientists over the last 4 decades, we show how the latest techniques in machinelearning can help explore problems of interest to theoretical physics and to pure mathematics. At the core of our programme is the question: how can AI help us with mathematics?
 Link to Event

 19 April 2023, 45pm
 Christoph Hertrich (LSE)
 Understanding Neural Network Expressivity via Polyhedral Geometry
 Neural networks with rectified linear unit (ReLU) activations are one of the standard models in modern machine learning. Despite their practical importance, fundamental theoretical questions concerning ReLU networks remain open until today. For instance, what is the precise set of (piecewise linear) functions exactly representable by ReLU networks with a given depth? Even the special case asking for the number of layers to compute a function as simple as max{0, x_{1}, x_{2}, x_{3}, x_{4}} has not been solved yet. In this talk we will explore the relevant background to understand this question and report about recent progress using tropical and polyhedral geometry as well as a computeraided approach based on mixedinteger programming. This is based on joint works with Amitabh Basu, Marco Di Summa, and Martin Skutella (NeurIPS 2021), as well as Christian Haase and Georg Loho (ICLR 2023).
 Link to Event
Past Talks

 15 March 2023
 Julia Lindberg (UT Austin)
 Estimating Gaussian mixtures using sparse polynomial moment systems
 The method of moments is a statistical technique for density estimation that solves a system of moment equations to estimate the parameters of an unknown distribution. A fundamental question critical to understanding identifiability asks how many moment equations are needed to get finitely many solutions and how many solutions there are. We answer this question for classes of Gaussian mixture models using the tools of polyhedral geometry. Using these results, we present a homotopy method to perform parameter recovery, and therefore density estimation, for high dimensional Gaussian mixture models. The number of paths tracked in our method scales linearly in the dimension.
 Video (YouTube), Video (Vimeo), Slides

 8 March 2023
 Nick Vannieuwenhoven (KU Leuven)
 Groupinvariant tensor train networks for supervised learning
 Invariance under selected transformations has recently proven to be a powerful inductive bias in several machine learning models. One class of such models are tensor train networks. In this talk, we impose invariance relations on tensor train networks. We introduce a new numerical algorithm to construct a basis of tensors that are invariant under the action of normal matrix representations of an arbitrary discrete group. This method can be up to several orders of magnitude faster than previous approaches. The groupinvariant tensors are then combined into a groupinvariant tensor train network, which can be used as a supervised machine learning model. We applied this model to a protein binding classification problem, taking into account problemspecific invariances, and obtained prediction accuracy in line with stateoftheart invariant deep learning approaches. This is joint work with Brent Sprangers.
 Video (YouTube), Video (Vimeo), Slides

 22 February 2023
 Guido Montufar (UCLA)
 Geometry and convergence of natural policy gradient methods
 We study the convergence of several natural policy gradient (NPG) methods in infinitehorizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in stateaction space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and coauthors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like logbarriers. Finally, we interpret the discretetime NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength. This is work with Johannes Müller.
 Video (YouTube), Video (Vimeo), Slides

 15 February 2023
 Kathlén Kohn (KTH)
 The Geometry of Linear Convolutional Networks
 We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (i.e., the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the function space. For instance, for LCNs with onedimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent. This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.
 Video (YouTube), Video (Vimeo), Slides

 8 February 2023
 Manolis Tsakiris (Chinese Academy of Sciences)
 Unlabelled Principal Component Analysis
 This talk will consider the problem of recovering a matrix of bounded rank from a corrupted version of it, where the corruption consists of an unknown permutation of the matrix entries. Exploiting the theory of Groebner bases for determinantal ideals, recovery theorems will be given. For a special instance of the problem, an algorithmic pipeline will be demonstrated, which employs methods for robust principal component analysis with respect to outliers and methods for linear regression without correspondences.
 Video (YouTube), Video (Vimeo), Slides

 12 September 2022
 Anindita Maiti (Northeastern University)
 Nonperturbative NonLagrangian Neural Network Field Theories
 Ensembles of Neural Network (NN) output functions describe field theories. The Neural Network Field Theories become free i.e. Gaussian in the limit of infinite width and independent parameter distributions, due to Central Limit Theorem (CLT). Interaction terms i.e. nonGaussianities in these field theories arise due to violations of CLT at finite width and/or correlated parameter distributions. In general, nonGaussianities render Neural Network Field Theories as nonperturbative and nonLagrangian. In this talk, I will describe methods to study nonperturbative nonLagrangian field theories in Neural Networks, via a dual framework over parameter distributions. This duality lets us study correlation functions and symmetries of NN field theories in the absence of an action; further the partition function can be approximated as a series sum over connected correlation functions. Thus, Neural Networks allow us to study nonperturbative nonLagrangian field theories through their architectures, and can be beneficial to both Machine Learning and physics.
 Video (YouTube), Video (Vimeo), Slides

 1 July 2022
 Alexei Vernitski (Essex)
 Using machine learning to solve mathematical problems and to search for examples and counterexamples in pure maths research
 Our recent research can be generally described as applying stateoftheart technologies of machine learning to suitable mathematical problems. As to machine learning, we use both reinforcement learning and supervised learning (underpinned by deep learning). As to mathematical problems, we mostly concentrate on knot theory, for two reasons; firstly, we have a positive experience of applying another kind of artificial intelligence (automated reasoning) to knot theory; secondly, examples and counterexamples in knot theory are finite and, typically, not very large, so they are convenient for the computer to work with. Here are some successful examples of our recent work, which I plan to talk about.
 1) Some recent studies used machine learning to untangle knots using Reidemeister moves, but they do not describe in detail how they implemented untangling on the computer. We invested effort into implementing untangling in one clearly defined scenario, and were successful, and made our computer code publicly available.
 2) We found counterexamples showing that some recent publications claiming to give new descriptions of realisable Gauss diagrams contain an error. We trained several machine learning agents to recognise realisable Gauss diagrams and noticed that they fail to recognise correctly the same counterexamples which human mathematicians failed to spot.
 3) One problem related to (and "almost" equivalent to) recognising the trivial knot is colouring the knot diagram by elements of algebraic structures called quandles (I will define them). We considered, for some types of knot diagrams (including petal diagrams), how supervised learning copes with this problem.
 Video (YouTube), Video (Vimeo), Slides

 29 September 2021
 Tom Oliver (Nottingham)
 Supervised learning of arithmetic invariants
 We explore the utility of standard supervised learning algorithms for a range of classification problems in number theory. In particular, we will consider class numbers of real quadratic fields, ranks of elliptic curves over Q, and endomorphism types for genus 2 curves over Q. Each case is motivated by its appearance in an open conjecture. Throughout the basic strategy is the same: we vectorize the underlying objects via the coefficients of their Lfunctions.
 Video (YouTube), Video (Vimeo)