Search

Skip to Search Results
  • Fall 2012

    Joulani, Pooria

    In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the stochastic" setting when the rewards come from a fixed distribution, an algorithm is given that...

1 - 1 of 1