Focused Co-citation: Improving the Retrieval of Related Pages on the Web

  • Technical report TR04-24. This thesis studies the problem of effectively finding related pages on the Web, where given the URL of a page, one wants to find other pages that are on the same topic. This is a both simple and natural way of searching for resources without being forced to formulate a search query using some keywords. A number of problems that often arise on the Web and affect the precision of algorithms that use the link structure of the Web to find related pages are identified. To address these problems, several new notions of \"focus\" of a collection of links are proposed and embedded within the Co-citation algorithm. The goal is that, when searching for related pages, an algorithm should give more focused collections of links a higher influence on the final ranking than less focused collections. Our experiments show that the \"focused\" versions of Co-citation outperform the unfocused version. | TRID-ID TR04-24

