Towards Understanding Latent Semantic Indexing

  • Author(s) / Creator(s)
  • Technical report TR03-03. The increasing amount of information available has made information retrieval tools become more and more important. Traditionally, these tools retrieve information by literally matching terms in the documents with the terms in the query. Unfortunately, because of synonymy and polysemy, the retrieval results of lexical matching approaches are sometimes incomplete and inaccurate. Conceptual-indexing techniques such as Latent Semantic Indexing (LSI) have been used to overcome the problems of lexical matching. The LSI model uses a statistical technique, singular value decomposition (SVD), to reveal the \"latent\" semantic structure and eliminate much of the \"noise\" (variability of word choice). Therefore, LSI is able to deal with the problems caused by synonymy and polysemy. Experiments show that LSI outperforms lexically matching methods on some well-known test document collections. In this essay, we develop a complete retrieval system based on the LSI model. The experimental results show that the system can retrieve documents effectively. We also use different parameters such as rank, similarity threshold and different term composition to test the retrieval system, so that we can choose an appropriate setting to get the best retrieval results. Furthermore, we apply different retrieval performance-enhancing techniques on the system. The experimental results demonstrate that relevance feedback and query expansion techniques yield significant improvement in the retrieval effectiveness of the system. We also exploit the folding-in method to append new documents and new index terms into the collection to save the time and effort required by frequent SVD recomputing. | TRID-ID TR03-03

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International