Download the full-sized PDF of NOVEL MACHINE LEARNING ALGORITHMSDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations



Other title
Fixed-depth decision tree
Active learning
Type of item
Degree grantor
University of Alberta
Author or creator
Farhangfar, Alireza
Supervisor and department
Greiner, Russell (Computing Science)
Szepesvari, Csaba (Computing Science)
Examining committee member and department
Neville, Jennifer (Purdue University)
Ray, Nilanjan (Computing Science)
Schuurmans, Dale (Computing Science)
Department of Computing Science

Date accepted
Graduation date
Doctor of Philosophy
Degree level
Many machine learning algorithms learn from the data by capturing certain interesting characteristics. Decision trees are used in many classification tasks. In some circumstances, we only want to consider fixed-depth trees. Unfortunately, finding the optimal depth-d decision tree can require time exponential in d. In the first part of this dissertation, we present OPTNBDT algorithm, which is a fast way to produce a near optimal fixed-depth decision tree under the Naıve Bayes assumption. We apply this technique to real-world datasets and find that our model improves the computational costs significantly while yielding relatively high accuracy. In the second part of this dissertation, we present two novel algorithms for active learning. There are scenarios where we have access to many unlabeled data; however, obtaining the labels for the data is difficult. An active learning process tries to address this issue by selectively requesting the label of few unlabeled instances, with the goal of using these newly labeled instances to produce an effective classifier. First, we focus on active learning for image segmentation, which requires producing a label for each pixel in an image. We provide an active learner (LMU) that first selects the image whose expected label will reduce the uncertainty of other unlabeled images the most, and then after greedily selects the most informative image. The results of our experiments, over real-world datasets show that training on very few informative images can produce a segmenter that is as good as training on the entire dataset. Finally, we present the importance sampling algorithm (ISAL) for actively learning in the standard classification framework, which we demonstrate its sample and label efficiency. In particular, on each iteration, ISAL identifies a distribution that puts large weight on instances whose labels are uncertain, then requests the label of an instance drawn from that distribution. We prove that ISAL can be more sample and label efficient than passive learning with an exponential convergence rate to the Bayes classifier on noise-free data. We also provide empirical studies that show ISAL is more effective than many other active learning algorithms.
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication
Alireza Farhangfar, Russell Greiner, and Csaba Szepesvari. Learning to segment from a few well-selected training images. In International Conference on Machine Learning (ICML), 2009.Alireza Farhangfar, Russell Greiner, and Martin Zinkevich. Near optimal fixed-depth decision tree. In International Symposium on Artificial Intelligence and Mathematics, 2008.

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 2203655
Last modified: 2015:10:12 12:49:58-06:00
Filename: Dissertation_submitted.pdf
Original checksum: 630efbbcf9680b6ff27df1e7dbdb2651
Well formed: true
Valid: true
File title: C:/Alireza/1Research/papers/my-work/Dissertation/Dissertation.dvi
Page count: 108
Activity of users you follow
User Activity Date