Usage
  • 424 views
  • 436 downloads

NOVEL MACHINE LEARNING ALGORITHMS

  • Author / Creator
    Farhangfar, Alireza
  • Many machine learning algorithms learn from the data by capturing certain interesting
    characteristics. Decision trees are used in many classification tasks. In some circumstances,
    we only want to consider fixed-depth trees. Unfortunately, finding the optimal
    depth-d decision tree can require time exponential in d. In the first part of this dissertation,
    we present OPTNBDT algorithm, which is a fast way to produce a near optimal
    fixed-depth decision tree under the Naıve Bayes assumption. We apply this technique to
    real-world datasets and find that our model improves the computational costs significantly
    while yielding relatively high accuracy.
    In the second part of this dissertation, we present two novel algorithms for active
    learning. There are scenarios where we have access to many unlabeled data; however,
    obtaining the labels for the data is difficult. An active learning process tries to address
    this issue by selectively requesting the label of few unlabeled instances, with the goal
    of using these newly labeled instances to produce an effective classifier. First, we focus
    on active learning for image segmentation, which requires producing a label for each
    pixel in an image. We provide an active learner (LMU) that first selects the image whose
    expected label will reduce the uncertainty of other unlabeled images the most, and then
    after greedily selects the most informative image. The results of our experiments, over
    real-world datasets show that training on very few informative images can produce a
    segmenter that is as good as training on the entire dataset.
    Finally, we present the importance sampling algorithm (ISAL) for actively learning
    in the standard classification framework, which we demonstrate its sample and label efficiency.
    In particular, on each iteration, ISAL identifies a distribution that puts large weight
    on instances whose labels are uncertain, then requests the label of an instance drawn from
    that distribution. We prove that ISAL can be more sample and label efficient than passive
    learning with an exponential convergence rate to the Bayes classifier on noise-free data.
    We also provide empirical studies that show ISAL is more effective than many other active
    learning algorithms.

  • Subjects / Keywords
  • Graduation date
    Spring 2013
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/R3C07X
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.