This seminar is held online via Zoom, and organised by Alex Kasprzyk, Lorenzo De Biase, Tom Oliver, and Sara Veneziale.
To be added to the mailing list, please contact one of the organisers. Everybody is welcome.
Follow us on researchseminars.org to be kept uptodate about upcoming talks. Recordings of talks are available on our YouTube channel.
Note: All times are UK times.
Upcoming Talks

 18 October 2023, 1011am
 Felix Schremmer (Hong Kong)
 Machine learning assisted exploration for affine Deligne–Lusztig varieties
 In this interdisciplinary study, we describe a procedure to assist and accelerate research in pure mathematics by using machine learning. We study affine Deligne–Lusztig varieties, certain geometric objects related to a number of mathematical questions, by carefully developing a number of machine learning models. This iterated pipeline yields well interpretable and highly accurate models, thus producing strongly supported mathematical conjectures. We explain how this method could have dramatically accelerated the research in the past. A completely new mathematical theorem, found by our MLassisted method and proved using the classical mathematical tools of the field, concludes this study. This is joint work with Bin Dong, Pengfei Jin, Xuhua He and Qingchao Yu.
 Link to Event
Past Talks

 27 September 2023
 Rahul Sarkar (Stanford)
 A framework for generating inequality conjectures
 In this talk, I'll present some recent and ongoing work, where we propose a systematic approach to finding abstract patterns in mathematical data, in order to generate conjectures about mathematical inequalities. We focus on strict inequalities of type f < g and associate them with a Banach manifold. We develop a structural understanding of this conjecture space by studying linear automorphisms of this manifold. Next, we propose an algorithmic pipeline to generate novel conjecture. As proof of concept, we give a toy algorithm to generate conjectures about the prime counting function and diameters of Cayley graphs of nonabelian simple groups. Some of these conjectures were proved while others remain unproven.
 Video (YouTube), Video (Vimeo), Slides

 20 September 2023
 Bruno Gavranović (Strathclyde)
 Fundamental Components of Deep Learning: A categorytheoretic approach
 Deep learning, despite its remarkable achievements, is still a young field. Like the early stages of many scientific disciplines, it is permeated by adhoc design decisions. From the intricacies of the implementation of backpropagation, through new and poorly understood phenomena such as double descent, scaling laws or incontext learning, to a growing zoo of neural network architectures  there are few unifying principles in deep learning, and no uniform and compositional mathematical foundation. In this talk I'll present a novel perspective on deep learning by utilising the mathematical framework of category theory. I'll identify two main conceptual components of neural networks, report on progress made throughout last years by the research community in formalising them, and show how they've been used to describe backpropagation, architectures, and supervised learning in general, shedding a new light on the existing field.
 Video (YouTube), Video (Vimeo), Slides

 6 September 2023
 Charlotte Aten (Denver)
 Discrete neural nets and polymorphic learning
 Classical neural network learning techniques have primarily been focused on optimization in a continuous setting. Early results in the area showed that many activation functions could be used to build neural nets that represent any function, but of course this also allows for overfitting. In an effort to ameliorate this deficiency, one seeks to reduce the search space of possible functions to a special class which preserves some relevant structure. I will propose a solution to this problem of a quite general nature, which is to use polymorphisms of a relevant discrete relational structure as activation functions. I will give some concrete examples of this, then hint that this specific case is actually of broader applicability than one might guess.
 Video (YouTube), Video (Vimeo), Slides

 2 August 2023
 Honglu Fan (Geneva)
 Local uniformization, Hilbert scheme of points and reinforcement learning
 In this talk, I will give a brief tour about how local uniformization, the Hilbert scheme of points, and reinforcement learning come together in a joint work with Gergely Berczi and Mingcong Zeng.
 Video (YouTube), Video (Vimeo), Slides

 26 July 2023
 QuocTung Le (ENS Lyon)
 Algorithmic and theoretical aspects of sparse deep neural networks
 Sparse deep neural networks offer a compelling practical opportunity to reduce the cost of training, inference and storage, which are growing exponentially in the state of the art of deep learning. In this presentation, we will introduce an approach to study sparse deep neural networks through the lens of another related problem: sparse matrix factorization, i.e., the problem of approximating a (dense) matrix by the product of (multiple) sparse factors. In particular, we identify and investigate in detail some theoretical and algorithmic aspects of a variant of sparse matrix factorization named fixed support matrix factorization (FSMF) in which the set of nonzero entries of sparse factors are known. Several fundamental questions of sparse deep neural networks such as the existence of optimal solutions of the training problem or topological properties of its function space can be addressed using the results of (FSMF). In addition, by applying the results of (FSMF), we also study butterfly parametrization, an approach that consists of replacing (large) weight matrices with the products of extremely sparse and structured ones in sparse deep neural networks.
 Video (YouTube), Video (Vimeo), Slides

 12 July 2023
 Challenger Mishra (Cambridge)
 Mathematical conjecture generation and Machine Intelligence
 Conjectures hold a special status in mathematics. Good conjectures epitomise milestones in mathematical discovery, and have historically inspired new mathematics and shaped progress in theoretical physics. Hilbert’s list of 23 problems and André Weil’s conjectures oversaw major developments in mathematics for decades. Crafting conjectures can often be understood as a problem in pattern recognition, for which Machine Learning (ML) is tailormade. In this talk, I will propose a framework that allows a principled study of a space of mathematical conjectures. Using this framework and exploiting domain knowledge and machine learning, we generate a number of conjectures in number theory and group theory. I will present evidence in support of some of the resulting conjectures and present a new theorem. I will lay out a vision for this endeavour, and conclude by posing some general questions about the pipeline.
 Video (YouTube), Video (Vimeo), Slides

 5 July 2023
 Thomas Gebhart (Minnesota)
 Specifying Local Constraints in Representation Learning with Cellular Sheaves
 Many machine learning algorithms constrain their learned representations by imparting inductive biases based on local smoothness assumptions. While these constraints are often natural and effective, there are situations in which their simplicity is misaligned with the representation structure required by the task, leading to a lack of expressivity and pathological behaviors like representational oversmoothing or inconsistency. Without a broader theoretical framework for reasoning about local representational constraints, it is difficult to conceptualize and move beyond such representational misalignments. In this talk, we will see that cellular sheaf theory offers an ideal algebrotopological framework for both reasoning about and implementing machine learning models on data which are subject to such localtoglobal constraints over a topological space. We will introduce cellular sheaves from a categorical perspective, observing the relationship between their definition as a limit object and the consistency objectives underlying representation learning. We will then turn to a discussion of sheaf (co)homology as a semicomputable tool for implementing these categorical concepts. Finally, we will observe two practical applications of these ideas in the form of sheaf neural networks, a generalization of graph neural networks for processing sheafvalued signals; and knowledge sheaves, a sheaftheoretic reformulation of knowledge graph embedding.
 Video (YouTube), Video (Vimeo), Slides

 21 June 2023
 Edward PearceCrump (Imperial)
 Exploring group equivariant neural networks using set partition diagrams
 What do jellyfish and an 11th century Japanese novel have to do with neural networks? In recent years, much attention has been given to developing neural network architectures that can efficiently learn from data with underlying symmetries. These architectures ensure that the learned functions maintain a certain geometric property called group equivariance, which determines how the output changes based on a change to the input under the action of a symmetry group. In this talk, we will describe a number of new group equivariant neural network architectures that are built using tensor power spaces of R^{n} as their layers. We will show that the learnable, linear functions between these layers can be characterised by certain subsets of set partition diagrams. This talk will be based on several papers that are to appear in ICML 2023.
 Video (YouTube), Video (Vimeo), Slides

 14 June 2023
 Vasco Brattka (Bundeswehr Munich)
 On the Complexity of Computing Gödel Numbers
 Given a computable sequence of natural numbers, it is a natural task to find a Gödel number of a program that generates this sequence. It is easy to see that this problem is neither continuous nor computable. In algorithmic learning theory this problem is well studied from several perspectives and one question studied there is for which sequences this problem is at least learnable in the limit. Here we study the problem on all computable sequences and we classify the Weihrauch complexity of it. For this purpose we can, among other methods, utilize the amalgamation technique known from learning theory. As a benchmark for the classification we use closed and compact choice problems and their jumps on natural numbers, and we argue that these problems correspond to induction and boundedness principles, as they are known from the Kirby–Paris hierarchy in reverse mathematics. We provide a topological as well as a computabilitytheoretic classification, which reveal some significant differences.
 Video (YouTube), Video (Vimeo), Slides

 31 May 2023
 Alvaro Torras Casas (Cardiff)
 Dataset comparison using persistent homology morphisms
 Persistent homology summarizes geometrical information of data by means of a barcode. Given a pair of datasets, X and Y, one might obtain their respective barcodes B(X) and B(Y). Thanks to stability results, if X and Y are similar enough one deduces that the barcodes B(X) and B(Y) are also close enough; however, the converse is not true in general. In this talk we consider the case when there is a known relation between X and Y encoded by a morphism between persistence modules. For example, this is the case when Y is a finite subset of euclidean space and X is a sample taken from Y. As in linear algebra, a morphism between persistence modules is understood by a choice of a pair of bases together with the associated matrix. I will explain how to use this matrix to get barcodes for images, kernels and cokernels. Additionally, I will explain how to compute an induced block function that relates the barcodes B(X) and B(Y). I will finish the talk revising some applications of this theory as well as future research directions.
 Video (YouTube), Video (Vimeo), Slides

 24 May 2023
 Taejin Paik (Seoul National University)
 IsometryInvariant and SubdivisionInvariant Representations of Embedded Simplicial Complexes
 Geometric objects such as meshes and graphs are commonly used in various applications, but analyzing them can be challenging due to their complex structures. Traditional approaches may not be robust to transformations like subdivision or isometry, leading to inconsistent results. Here is a novel approach to address these limitations by using only topological and geometric data to analyze simplicial complexes in a subdivisioninvariant and isometryinvariant way. This approach involves using a graph neural network to create an O(3)equivariant operator and the Euler curve transform to generate sufficient statistics that describe the properties of the object.
 Video (YouTube), Video (Vimeo), Slides

 3 May 2023
 Daniel Platt (KCL)
 Group invariant machine learning by fundamental domain projections
 In many applications one wants to learn a function that is invariant under a group action. For example, classifying images of digits, no matter how they are rotated. There exist many approaches in the literature to do this. I will mention two approaches that are very useful in many applications, but struggle if the group is big or acts in a complicated way. I will then explain our approach which does not have these two problems. The approach works by finding some "canonical representative" of each input element. In the example of images of digits, one may rotate the digit so that the brightest quarter is in the topleft, which would define a "canonical representative". In the general case, one has to define what that means. Our approach is useful if the group is big, and I will present experiments on the Complete Intersection Calabi–Yau and Kreuzer–Skarke datasets to show this. Our approach is useless if the group is small, and the case of rotated images of digits is an example of this. This is joint work with Benjamin Aslan and David Sheard.
 Video (YouTube), Video (Vimeo), Slides

 26 April 2023
 Bastian Rieck (Munich)
 Curvature for Graph Learning
 Curvature bridges geometry and topology, using local information to derive global statements. While wellknown in a differential topology context, it was recently extended to the domain of graphs. In fact, graphs give rise to various notions of curvature, which differ in expressive power and purpose. We will give a brief overview of curvature in graphs, define some relevant concepts, and show their utility for data science and machine learning applications. In particular, we shall discuss two applications: first, the use of curvature to distinguish between different models for synthesising new graphs from some unknown distribution; second, a novel framework for defining curvature for hypergraphs, whose structural properties require a more generic setting. We will also describe new applications that are specifically geared towards a treatment by curvature, thus underlining the utility of this concept for data science.
 Video (YouTube), Video (Vimeo), Slides

 19 April 2023
 Christoph Hertrich (LSE)
 Understanding Neural Network Expressivity via Polyhedral Geometry
 Neural networks with rectified linear unit (ReLU) activations are one of the standard models in modern machine learning. Despite their practical importance, fundamental theoretical questions concerning ReLU networks remain open until today. For instance, what is the precise set of (piecewise linear) functions exactly representable by ReLU networks with a given depth? Even the special case asking for the number of layers to compute a function as simple as max{0, x_{1}, x_{2}, x_{3}, x_{4}} has not been solved yet. In this talk we will explore the relevant background to understand this question and report about recent progress using tropical and polyhedral geometry as well as a computeraided approach based on mixedinteger programming. This is based on joint works with Amitabh Basu, Marco Di Summa, and Martin Skutella (NeurIPS 2021), as well as Christian Haase and Georg Loho (ICLR 2023).
 Video (YouTube), Video (Vimeo), Slides

 12 April 2023
 Vasco Portilheiro (UCL)
 Barriers to Learning Symmetries
 Given the success of equivariant models, there has been increasing interest in models which can learn a symmetry from data, rather than it being imposed a priori. We present work which formalizes a tradeoff between (a) the simultaneous learnability of symmetries and equivariant functions, and (b) universal approximation of equivariant functions. The work is motivated by an experiment which modifies the Equivariant Multilayer Perceptron (EMLP) of Finzi et al. (2021) in an attempt to learn a group together with an equivariant function. Additionally, the tradeoff is shown to not exist for groupconvolutional networks.
 Video (YouTube), Video (Vimeo), Slides

 5 April 2023
 YangHui He (LIMS)
 Universes as Bigdata: Physics, Geometry and MachineLearning
 The search for the Theory of Everything has led to superstring theory, which then led physics, first to algebraic/differential geometry/topology, and then to computational geometry, and now to data science. With a concrete playground of the geometric landscape, accumulated by the collaboration of physicists, mathematicians and computer scientists over the last 4 decades, we show how the latest techniques in machinelearning can help explore problems of interest to theoretical physics and to pure mathematics. At the core of our programme is the question: how can AI help us with mathematics?
 Video (YouTube), Video (Vimeo), Slides


 22 March 2023
 Patrizio Frosini (Bologna)
 Some recent results on the theory of GENEOs and its application to Machine Learning
 Group equivariant nonexpansive operators (GENEOs) have been introduced a few years ago as mathematical tools for approximating data observers when data are represented by realvalued or vectorvalued functions. The use of these operators is based on the assumption that the interpretation of data depends on the geometric properties of the observers. In this talk we will illustrate some recent results in the theory of GENEOs, showing how these operators can make available a new approach to topological data analysis and geometric deep learning.
 Video (YouTube), Video (Vimeo), Slides

 15 March 2023
 Julia Lindberg (UT Austin)
 Estimating Gaussian mixtures using sparse polynomial moment systems
 The method of moments is a statistical technique for density estimation that solves a system of moment equations to estimate the parameters of an unknown distribution. A fundamental question critical to understanding identifiability asks how many moment equations are needed to get finitely many solutions and how many solutions there are. We answer this question for classes of Gaussian mixture models using the tools of polyhedral geometry. Using these results, we present a homotopy method to perform parameter recovery, and therefore density estimation, for high dimensional Gaussian mixture models. The number of paths tracked in our method scales linearly in the dimension.
 Video (YouTube), Video (Vimeo), Slides

 8 March 2023
 Nick Vannieuwenhoven (KU Leuven)
 Groupinvariant tensor train networks for supervised learning
 Invariance under selected transformations has recently proven to be a powerful inductive bias in several machine learning models. One class of such models are tensor train networks. In this talk, we impose invariance relations on tensor train networks. We introduce a new numerical algorithm to construct a basis of tensors that are invariant under the action of normal matrix representations of an arbitrary discrete group. This method can be up to several orders of magnitude faster than previous approaches. The groupinvariant tensors are then combined into a groupinvariant tensor train network, which can be used as a supervised machine learning model. We applied this model to a protein binding classification problem, taking into account problemspecific invariances, and obtained prediction accuracy in line with stateoftheart invariant deep learning approaches. This is joint work with Brent Sprangers.
 Video (YouTube), Video (Vimeo), Slides

 22 February 2023
 Guido Montufar (UCLA)
 Geometry and convergence of natural policy gradient methods
 We study the convergence of several natural policy gradient (NPG) methods in infinitehorizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in stateaction space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and coauthors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like logbarriers. Finally, we interpret the discretetime NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength. This is work with Johannes Müller.
 Video (YouTube), Video (Vimeo), Slides

 15 February 2023
 Kathlén Kohn (KTH)
 The Geometry of Linear Convolutional Networks
 We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (i.e., the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the function space. For instance, for LCNs with onedimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent. This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.
 Video (YouTube), Video (Vimeo), Slides

 8 February 2023
 Manolis Tsakiris (Chinese Academy of Sciences)
 Unlabelled Principal Component Analysis
 This talk will consider the problem of recovering a matrix of bounded rank from a corrupted version of it, where the corruption consists of an unknown permutation of the matrix entries. Exploiting the theory of Groebner bases for determinantal ideals, recovery theorems will be given. For a special instance of the problem, an algorithmic pipeline will be demonstrated, which employs methods for robust principal component analysis with respect to outliers and methods for linear regression without correspondences.
 Video (YouTube), Video (Vimeo), Slides

 12 September 2022
 Anindita Maiti (Northeastern University)
 Nonperturbative NonLagrangian Neural Network Field Theories
 Ensembles of Neural Network (NN) output functions describe field theories. The Neural Network Field Theories become free i.e. Gaussian in the limit of infinite width and independent parameter distributions, due to Central Limit Theorem (CLT). Interaction terms i.e. nonGaussianities in these field theories arise due to violations of CLT at finite width and/or correlated parameter distributions. In general, nonGaussianities render Neural Network Field Theories as nonperturbative and nonLagrangian. In this talk, I will describe methods to study nonperturbative nonLagrangian field theories in Neural Networks, via a dual framework over parameter distributions. This duality lets us study correlation functions and symmetries of NN field theories in the absence of an action; further the partition function can be approximated as a series sum over connected correlation functions. Thus, Neural Networks allow us to study nonperturbative nonLagrangian field theories through their architectures, and can be beneficial to both Machine Learning and physics.
 Video (YouTube), Video (Vimeo), Slides

 1 July 2022
 Alexei Vernitski (Essex)
 Using machine learning to solve mathematical problems and to search for examples and counterexamples in pure maths research
 Our recent research can be generally described as applying stateoftheart technologies of machine learning to suitable mathematical problems. As to machine learning, we use both reinforcement learning and supervised learning (underpinned by deep learning). As to mathematical problems, we mostly concentrate on knot theory, for two reasons; firstly, we have a positive experience of applying another kind of artificial intelligence (automated reasoning) to knot theory; secondly, examples and counterexamples in knot theory are finite and, typically, not very large, so they are convenient for the computer to work with. Here are some successful examples of our recent work, which I plan to talk about.
 1) Some recent studies used machine learning to untangle knots using Reidemeister moves, but they do not describe in detail how they implemented untangling on the computer. We invested effort into implementing untangling in one clearly defined scenario, and were successful, and made our computer code publicly available.
 2) We found counterexamples showing that some recent publications claiming to give new descriptions of realisable Gauss diagrams contain an error. We trained several machine learning agents to recognise realisable Gauss diagrams and noticed that they fail to recognise correctly the same counterexamples which human mathematicians failed to spot.
 3) One problem related to (and "almost" equivalent to) recognising the trivial knot is colouring the knot diagram by elements of algebraic structures called quandles (I will define them). We considered, for some types of knot diagrams (including petal diagrams), how supervised learning copes with this problem.
 Video (YouTube), Video (Vimeo), Slides

 29 September 2021
 Tom Oliver (Nottingham)
 Supervised learning of arithmetic invariants
 We explore the utility of standard supervised learning algorithms for a range of classification problems in number theory. In particular, we will consider class numbers of real quadratic fields, ranks of elliptic curves over Q, and endomorphism types for genus 2 curves over Q. Each case is motivated by its appearance in an open conjecture. Throughout the basic strategy is the same: we vectorize the underlying objects via the coefficients of their Lfunctions.
 Video (YouTube), Video (Vimeo), Slides