Usage
  • 114 views
  • 230 downloads

Topic Modelling via Community Mining of Term Co-occurrence Networks

  • Author / Creator
    Austin, Eric
  • Topic modelling seeks to uncover the conceptual and thematic content of collections of documents. These topics can be used as features for document indexing and classification. However, topic models are increasingly important as tools of applied research. As we seek to develop agents capable of having real conversations with humans, topic models are needed to control topic drift and guide the conversation. Unfortunately, the most popular topic models in use today do not provide a suitable topic structure for these purposes and the state-of-the-art models based on neural networks suffer from many of the same drawbacks while requiring specialized hardware and many hours to train.

    We take a fundamentally different approach to topic modelling. Our algorithm, Community Topic, is based on mining communities of terms from term-occurrence networks extracted from the documents. In addition to providing interpretable collections of terms as topics, the network representation provides a natural topic structure. The topics form a network, so topic similarity is inferred from the weights of the edges between them. Super-topics can be found by iteratively applying community detection on the topic network, grouping similar topics together. Sub-topics can be found by iteratively applying community detection on a single topic community. This can be done dynamically, with the user or conversation agent moving up and down the topic hierarchy as desired.

    We evaluate Community Topic against two contenders. We find that our algorithm detects topics with the highest coherence as measured by two standard automated metrics. Our algorithm has the fastest run time and detects topics in few seconds with no specialized hardware required. It is hyperparameter free and can detect topics at multiple scales. It finds coherent sub- and super-topics at multiple levels. This makes Community Topic an ideal topic modelling algorithm for both applied research and practical applications like conversational agents.

  • Subjects / Keywords
  • Graduation date
    Fall 2022
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-8vzt-9a11
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.