
Vector Step-size Adaptation for Continual, Online Prediction

    Jacobsen, Andrew
  • In this thesis, we investigate different vector step-size adaptation approaches for continual, online prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second-order update --- a vector approximation of the inverse Hessian. Another family of approaches uses meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We introduce an instance of AdaGain which combines meta-descent with RMSProp --- a method we call RMSGain --- which is particularly robust across several prediction problems and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.

    Fall 2019
    Master of Science
