Download the full-sized PDF of Online Learning under Partial FeedbackDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Online Learning under Partial Feedback Open Access


Other title
online learning
partial feedback
Type of item
Degree grantor
University of Alberta
Author or creator
Wu, Yifan
Supervisor and department
Csaba Szepesvari (Computing Science)
Andras Gyorgy (Computing Science)
Examining committee member and department
Csaba Szepesvari (Computing Science)
Andras Gyorgy (Computing Science)
Micheal Bowling (Computing Science)
Department of Computing Science

Date accepted
Graduation date
2016-06:Fall 2016
Master of Science
Degree level
In an online learning problem a player makes decisions in a sequential manner. In each round, the player receives some reward that depends on his action and an outcome generated by the environment while some feedback information about the outcome is revealed. The goal of the player can be various. In this thesis we investigate several variants of online learning problems with different feedback models and objectives. First we consider the pure exploration problem with multi-action probes. We design algorithms that can find the best one or several actions with high probability while using as few probes as possible. Then we study the side observation model in the regret minimization scenario. We derive a novel finite time distribution dependent lower bound and design asymptotically optimal and minimax optimal algorithms. Last we investigate the conservative bandit problem where the objective is to minimize the regret while maintaining the cumulative reward above a baseline. We design algorithms for several variants of the problem and derive a lower bound. In each of the three variants of the online learning problem we consider, our problem setting generalizes some previous work. The theoretical results successfully recover existing results in special cases as well as propose novel perspectives in the more general settings.
This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
Citation for previous publication
Yifan Wu, Roshan Shariff, Tor Lattimore and Csaba Szepesvári. Conservative Bandits. In ICML-2016.Yifan Wu, András György and Csaba Szepesvári. Online Learning with Gaussian Payoffs and Side Observations. In NIPS-2015.On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments. Yifan Wu, András György and Csaba Szepesvári. In ICML-2015.

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 991246
Last modified: 2016:11:16 14:18:10-07:00
Filename: Wu_Yifan_201609_MSc.pdf
Original checksum: 30f62d9ea08192203ff3c30610c8f13b
Well formed: true
Valid: true
Status message: Too many fonts to report; some fonts omitted. Total fonts = 1136
File title: Introduction
Page count: 124
Activity of users you follow
User Activity Date