Events
ML Seminar - Thomas Harvey
Centre for Fundamental PhysicsTitle: Geometry and Learning
Abstract: During gradient descent, a metric is imposed on the parameters, usually called the gradient preconditioner in the literature. This preconditioner determines how we measure distances in parameter space when taking optimisation steps. In standard stochastic gradient descent this is taken to be the Euclidean metric, but many other choices are possible: the Adam optimiser can be viewed as one such choice. With second-order methods proving intractable for training neural networks, exploring different preconditioners offers a natural way to improve training performance by adapting to the curvature of the loss landscape. In this talk, I will present two geometrically-inspired gradient preconditioners. The first uses the pullback metric from embedding the loss landscape as a surface in a higher-dimensional space, which is the same metric that underlies common loss landscape visualisations. The second arises from considering functional gradient descent in an infinite-dimensional function space, then restricting to the finite-dimensional manifold of functions realisable by our neural network parameterisation. Across various tasks and architectures, we observe consistent improvements over the Adam optimiser in convergence speed and final performance.
Updated by: Dimitrios Bachtis

