Contrasting sequence groups by emerging sequences

  • Author / Creator
    Deng, Kang
  • Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Comparing groups of sequence data is a relevant task. To contrast sequence groups, we define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in another, and thus distinguishing sequences of different classes. There are two challenges to distinguish sequence classes by ESs: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a sliding window matching mechanism. A classification model based on ESs is also proposed. Evaluating against several other learning algorithms, the experiments on two datasets show that our similar ESs-based classification model outperforms the baseline approaches. With the ESs' high discriminative power, our proposed model achieves satisfactory F-measures on classifying sequences.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Osmar R. Zaiane, Computing Science
  • Examining committee members and their departments
    • Paul Lu, Computing Science
    • Scott Dick, Electrical and Computer Engineering