Analyzing Controversy in Wikipedia

  • Author / Creator
    Sepehri Rad, Hoda
  • This thesis describes a novel controversy model that helps the current manual process in automatically identifying controversial Wikipedia articles and warning readers about disputable information contained in these articles. The model is based on identifying collaboration patterns among editors and inferring their attitudes towards one another. These are exploited in the form of a social network representing the overall structure of history of collaborations of editors of each article. A set of features, rooted at sound theories of social behavior, are extracted from each network to train a classifier distinguishing controversial articles from other articles. To infer attitudes, a novel supervised approach is employed based on votes cast in Wikipedia admin elections. The combination of structural features extracted from each network, and the method for inferring attitudes of editors provides an accurate and efficient controversy model as demonstrated by several experiments and comparison with other methods. Also, a systematic evaluation and comparison of previous controversy models is provided. The results show the inefficiency of most of these models in capturing the complex process of formation of controversy, and express the power of editors collaboration networks for modeling this process. Finally, to give more insight about controversial topics, a novel framework is proposed to analyze controversy at a more fine-grained level. Using this framework, two different approaches are proposed. The first approach aims to separate the most controversial parts of each article from other non-controversial and reliable parts. This approach is shown to be a challenging problem due to both designing a suitable method and providing a quantitative evaluation. On other hand, the second approach helps to get a ranked list of the revisions that contributed most to controversy of the article. For this approach, a solution based on maximum coverage problem is proposed and its usefulness is shown by quantitative results and some case studies.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Barbosa, Denilson (computing science)
  • Examining committee members and their departments
    • Barbosa, Denilson (computing science )
    • Harms, Janelle (computing science)
    • Inkpen, Diana (computing science)
    • Osmar, Zaiane (computing science )
    • Schuurmans, Dale (computing science)