NOVEL MACHINE LEARNING ALGORITHMS

Farhangfar, Alireza

doi:doi:10.7939/R3C07X

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

589 views
541 downloads

NOVEL MACHINE LEARNING ALGORITHMS

Author / Creator

Farhangfar, Alireza
Many machine learning algorithms learn from the data by capturing certain interesting
characteristics. Decision trees are used in many classification tasks. In some circumstances,
we only want to consider fixed-depth trees. Unfortunately, finding the optimal
depth-d decision tree can require time exponential in d. In the first part of this dissertation,
we present OPTNBDT algorithm, which is a fast way to produce a near optimal
fixed-depth decision tree under the Naıve Bayes assumption. We apply this technique to
real-world datasets and find that our model improves the computational costs significantly
while yielding relatively high accuracy.
In the second part of this dissertation, we present two novel algorithms for active
learning. There are scenarios where we have access to many unlabeled data; however,
obtaining the labels for the data is difficult. An active learning process tries to address
this issue by selectively requesting the label of few unlabeled instances, with the goal
of using these newly labeled instances to produce an effective classifier. First, we focus
on active learning for image segmentation, which requires producing a label for each
pixel in an image. We provide an active learner (LMU) that first selects the image whose
expected label will reduce the uncertainty of other unlabeled images the most, and then
after greedily selects the most informative image. The results of our experiments, over
real-world datasets show that training on very few informative images can produce a
segmenter that is as good as training on the entire dataset.
Finally, we present the importance sampling algorithm (ISAL) for actively learning
in the standard classification framework, which we demonstrate its sample and label efficiency.
In particular, on each iteration, ISAL identifies a distribution that puts large weight
on instances whose labels are uncertain, then requests the label of an instance drawn from
that distribution. We prove that ISAL can be more sample and label efficient than passive
learning with an exponential convergence rate to the Bayes classifier on noise-free data.
We also provide empirical studies that show ISAL is more effective than many other active
learning algorithms.
Subjects / Keywords
- Active learning
- Fixed-depth decision tree
Graduation date

Spring 2013
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3C07X
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Szepesvari, Csaba (Computing Science)
- Greiner, Russell (Computing Science)
Examining committee members and their departments
- Schuurmans, Dale (Computing Science)
- Neville, Jennifer (Purdue University)
- Ray, Nilanjan (Computing Science)