Search

Filter

Author / Creator / Contributor

Show 4 more ...

Subject / Keyword

Show 4 more ...

Year

Collections

Languages

22English

Item type

22Thesis

Departments

22Department of Computing Science

Supervisors

Show 4 more ...

Results for "supervisors_tesim:"Szepesvari, Csaba (Computing Science)""

Optimal Mechanisms for Machine Learning: A Game-Theoretic Approach to Designing Machine Learning Competitions
Download

Spring 2013

Ajallooeian, Mohammad Mahdi

In this thesis we consider problems where a self-interested entity, called the principal, has private access to some data that she wishes to use to solve a prediction problem by outsourcing the development of the predictor to some other parties. Assuming the principal, who needs the machine...
Multi-Armed Bandit Problems under Delayed Feedback
Download

Fall 2012

Joulani, Pooria

In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the stochastic" setting when the rewards come from a fixed distribution, an algorithm is given that...
Pure Exploration in Multi-Armed Bandits
Download

Spring 2023

Stephens, Connor J

Many practical problems in fields ranging from online advertising to genomics can be framed as the task of selecting the best option from among several choices, based on a limited number of noisy evaluations of the quality of each choice. Pure exploration in multi-armed bandits is an...
Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM
Download

Spring 2015

Shariff, Roshan

Sampling from a given probability distribution is a key problem in many different disciplines. Markov chain Monte Carlo (MCMC) algorithms approach this problem by constructing a random walk governed by a specially constructed transition probability distribution. As the random walk progresses, the...
Online Learning for Linearly Parametrized Control Problems
Download

Spring 2013

Abbasi-Yadkori, Yasin

In a discrete-time online control problem, a learner makes an effort to control the state of an initially unknown environment so as to minimize the sum of the losses he suffers, where the losses are assumed to depend on the individual state-transitions. Various models of control problems have...
Towards Sample Efficient Reinforcement Learning with Function Approximation
Download

Fall 2021

Ayoub, Alex

This thesis proposes novel algorithmic ideas in reinforcement learning for regret minimization. These algorithmic ideas enjoy nice theoretical guarantees and are more practical in large problems than their alternatives. We focus on finite-horizon episodic RL. We propose model-based and model-free...
The Role of Information in Online Learning
Download

Fall 2012

Bartók, Gábor

In a partial-monitoring game a player has to make decisions in a sequential manner. In each round, the player suffers some loss that depends on his decision and an outcome chosen by an opponent, after which he receives "some" information about the outcome. The goal of the player is to keep the...
Primal-Dual Algorithms for Learning in Constrained Markov Decision Processes
Download

Fall 2023

Liu, Chang

Many real-world tasks in fields such as robotics and control can be formulated as constrained Markov decision processes (CMDPs). In CMDPs, the objective is usually to optimize the return while ensuring some constraints being satisfied at the same time. The primal-dual approach is a common...
Optimized Batch Policy Evaluation in the Presence of Monotone Responses
Download

Spring 2021

Dong, Wang

In batch policy evaluation the goal is to predict the value of a policy given some historical data. A specific example, which motivated the approach pursued in this thesis, is to predict the probability of putting a natural wildfire out given some specific configuration of dispatched resources,...
Bregman Divergence Clustering: A Convex Approach
Download

Fall 2013

Cheng, Hao

Due to its wide application in various fields, clustering, as a fundamental unsupervised learning problem, has been intensively investigated over the past few decades. Unfortunately, standard clustering formulations are known to be computationally intractable. Although many convex relaxations of...