Bandit Convex Optimization with Biased Noisy Gradient Oracles

  • Author / Creator
    Hu, Xiaowei
  • Optimizing an objective function over convex sets is a key problem in many different machine learning models. One of the various kinds of well studied objective functions is the convex function, where any local minimum must be the global mini- mum over the domain. To find the optimal point that minimize the objective convex function, a natural choice for the search direction is the negative gradient. The resulting algorithm, usually called the gradient descent method, is widely used to approach convex optimization problems. Given the entire objective function or the first-order information (gradient), it is straightforward to do the line search follow- ing the chosen direction. However, another scenario exists where the algorithm has no access to such entire function or the first-order information, except evaluation of some queried points. Then there comes the bandit optimization problem, which is also known as zeroth-order or derivative-free optimization. Algorithms for bandit convex optimization often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, like gradient descent, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of “noise” in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. For example, the gradient estimate with a small bias tends to have a large variance, while the estimate with a small variance could have a large bias. Considering that both the bias and variance of the gradient estimate might jeopardize the optimization algorithm. It is worthwhile to measure their influences in a quantitative pattern, and study if the optimization error basing on a certain gradient estimate can be further improved. This thesis proposes a novel framework that replaces the specific gradient estimation methods with an abstract oracle. The oracle directly interacts with the algorithm, outputting biased, noisy gradient estimates satisfying some predefined properties. In this way, we abstract tradeoffs of the bias and variance, skipping the details of constructing gradient estimates. With the help of the new framework we unify previous works, reproducing their results in a clean and concise fashion, proving the upper bound of the optimization error with the Mirror Descent algorithm. Meanwhile, perhaps more importantly, the framework also allows us to show a lower bound of the optimization error, which can match the corresponding upper bound under certain conditions. This formally demonstrates that, to achieve the optimal root-n rate for the bandit convex optimization, either the algorithms that use existing gradient estimators, or the proof techniques used to analyze them, have to go beyond what exists today.

  • Subjects / Keywords
  • Graduation date
    Spring 2017
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.