Multiple Kernel Learning with Many Kernels

Afkanpour, Arash

doi:doi:10.7939/R3K01H

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

598 views
868 downloads

Multiple Kernel Learning with Many Kernels

Author / Creator

Afkanpour, Arash
Multiple kernel learning (MKL) addresses the problem of learning the kernel function from data. Since a kernel function is associated with an underlying feature space, MKL can be considered as a systematic approach to feature selection. Many of the existing MKL algorithms perform kernel learning by combining a given set of base kernels. While efficient MKL algorithms are capable of handling large training sets, their computational complexity depends linearly on the number of base kernels. Hence, these algorithms are not scalable to problems with many kernels.

In this thesis, we investigate MKL when the number of kernels is very large. We aim to exploit the structure of kernel space to design efficient MKL algorithms. Such a structure exists for some of the important families of kernels, such as polynomial kernels and Gaussian kernels. We propose efficient gradient-based algorithms, with convergence guarantees, for the two prominent problem formulations of MKL, i.e. one-stage and two-stage MKL. We show that stochastic gradient descent and greedy coordinate descent are suitable algorithms to deal with very large kernel sets.

We propose an efficient stochastic gradient descent method for one-stage MKL. We show that sampling
a coordinate with a probability proportional to the magnitude of gradient results in a low-variance estimate of gradient. For the case of learning polynomial kernels we show that the computational complexity of our algorithm has only a logarithmic dependence on the number of kernels. We show how greedy coordinate descent can be applied to one-stage and two-stage MKL. Greedy coordinate descent is in particular useful when the goal is to combine continuously-parameterized kernels, such as Gaussian kernels.

We perform thorough empirical study on synthetic and real data to compare the new gradient-based algorithms against several MKL algorithms. Our experiments demonstrate that our new methods are competitive in terms of generalization performance, while their computational cost is significantly less than other methods that enjoy similarly good generalization performance.
Subjects / Keywords
Graduation date

Spring 2013
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3K01H
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Szepesvari, Csaba (Computing Science)
- Bowling, Michael (Computing Science)
Examining committee members and their departments
- Szepesvari, Csaba (Computing Science)
- Bowling, Michael (Computing Science)
- Pontil, Massimiliano (Computer Science, University College London)
- Schuurmans, Dale (Computing Science)
- Mizera, Ivan (Mathematical and Statistical Sciences)