Search
Skip to Search Results- 2Joulani, Pooria
- 1Abbasi-Yadkori, Yasin
- 1Afkanpour, Arash
- 1Ajallooeian, Mohammad Mahdi
- 1Aslan,Ozlem
- 1Ayoub, Alex
- 4Machine learning
- 4Online Learning
- 4Reinforcement Learning
- 3Learning theory
- 3Machine Learning
- 2Online learning
Results for "supervisors_tesim:"Szepesvari, Csaba (Computing Science)""
-
Optimal Mechanisms for Machine Learning: A Game-Theoretic Approach to Designing Machine Learning Competitions
DownloadSpring 2013
In this thesis we consider problems where a self-interested entity, called the principal, has private access to some data that she wishes to use to solve a prediction problem by outsourcing the development of the predictor to some other parties. Assuming the principal, who needs the machine...
-
Fall 2012
In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the stochastic" setting when the rewards come from a fixed distribution, an algorithm is given that...
-
Spring 2015
Sampling from a given probability distribution is a key problem in many different disciplines. Markov chain Monte Carlo (MCMC) algorithms approach this problem by constructing a random walk governed by a specially constructed transition probability distribution. As the random walk progresses, the...
-
Spring 2013
In a discrete-time online control problem, a learner makes an effort to control the state of an initially unknown environment so as to minimize the sum of the losses he suffers, where the losses are assumed to depend on the individual state-transitions. Various models of control problems have...
-
Fall 2021
This thesis proposes novel algorithmic ideas in reinforcement learning for regret minimization. These algorithmic ideas enjoy nice theoretical guarantees and are more practical in large problems than their alternatives. We focus on finite-horizon episodic RL. We propose model-based and model-free...
-
Fall 2012
In a partial-monitoring game a player has to make decisions in a sequential manner. In each round, the player suffers some loss that depends on his decision and an outcome chosen by an opponent, after which he receives "some" information about the outcome. The goal of the player is to keep the...
-
Fall 2023
Many real-world tasks in fields such as robotics and control can be formulated as constrained Markov decision processes (CMDPs). In CMDPs, the objective is usually to optimize the return while ensuring some constraints being satisfied at the same time. The primal-dual approach is a common...
-
Spring 2021
In batch policy evaluation the goal is to predict the value of a policy given some historical data. A specific example, which motivated the approach pursued in this thesis, is to predict the probability of putting a natural wildfire out given some specific configuration of dispatched resources,...
-
Fall 2013
Due to its wide application in various fields, clustering, as a fundamental unsupervised learning problem, has been intensively investigated over the past few decades. Unfortunately, standard clustering formulations are known to be computationally intractable. Although many convex relaxations of...