A Parameter Selection Framework For Semi-Supervised Clustering Algorithms

  • Author / Creator
    Pourrajabi, Mojgan
  • Many clustering techniques require parameter settings and depending on an algorithms sensitivity to the parameter, the choice of the parameter value can be very important. Several
    approaches have been proposed to find the “best” value of the clustering parameter for the
    different unsupervised clustering methods.
    We introduce a general method, denoted as “Cross-validation framework for finding clustering parameters” (CVCP). Given a data set, CVCP selects the “best” parameter value for a semi-supervised clustering method based on available constraints or labels that are given as input to a semi-supervised clustering method. CVCP is evaluated based on selecting the “best” value of k for a semi-supervised Kmeans-based clustering algorithm and the “best” value of MinPts for a semi-supervised density-based clustering algorithm. Our experimental results show that using the framework to select parameters can significantly improve the expected performance of a semi-supervised clustering method when appropriate parameter
    values often have to be “guessed”.

  • Subjects / Keywords
  • Graduation date
    Fall 2013
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Goebel, Randolph (Computing Science)
    • Campello, Ricardo (Computing Science, University of Säo Paulo at Säo Carlos, Brazil)