- 538 views
- 504 downloads
NOVEL MACHINE LEARNING ALGORITHMS
-
- Author / Creator
- Farhangfar, Alireza
-
Many machine learning algorithms learn from the data by capturing certain interesting
characteristics. Decision trees are used in many classification tasks. In some circumstances,
we only want to consider fixed-depth trees. Unfortunately, finding the optimal
depth-d decision tree can require time exponential in d. In the first part of this dissertation,
we present OPTNBDT algorithm, which is a fast way to produce a near optimal
fixed-depth decision tree under the Naıve Bayes assumption. We apply this technique to
real-world datasets and find that our model improves the computational costs significantly
while yielding relatively high accuracy.
In the second part of this dissertation, we present two novel algorithms for active
learning. There are scenarios where we have access to many unlabeled data; however,
obtaining the labels for the data is difficult. An active learning process tries to address
this issue by selectively requesting the label of few unlabeled instances, with the goal
of using these newly labeled instances to produce an effective classifier. First, we focus
on active learning for image segmentation, which requires producing a label for each
pixel in an image. We provide an active learner (LMU) that first selects the image whose
expected label will reduce the uncertainty of other unlabeled images the most, and then
after greedily selects the most informative image. The results of our experiments, over
real-world datasets show that training on very few informative images can produce a
segmenter that is as good as training on the entire dataset.
Finally, we present the importance sampling algorithm (ISAL) for actively learning
in the standard classification framework, which we demonstrate its sample and label efficiency.
In particular, on each iteration, ISAL identifies a distribution that puts large weight
on instances whose labels are uncertain, then requests the label of an instance drawn from
that distribution. We prove that ISAL can be more sample and label efficient than passive
learning with an exponential convergence rate to the Bayes classifier on noise-free data.
We also provide empirical studies that show ISAL is more effective than many other active
learning algorithms. -
- Subjects / Keywords
-
- Graduation date
- Spring 2013
-
- Type of Item
- Thesis
-
- Degree
- Doctor of Philosophy
-
- License
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.