Analyzing Controversy in Wikipedia

Sepehri Rad, Hoda

doi:doi:10.7939/R3CJ87R31

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

456 views
278 downloads

Analyzing Controversy in Wikipedia

Author / Creator

Sepehri Rad, Hoda
This thesis describes a novel controversy model that helps the current manual process in automatically identifying controversial Wikipedia articles and warning readers about disputable information contained in these articles. The model is based on identifying collaboration patterns among editors and inferring their attitudes towards one another. These are exploited in the form of a social network representing the overall structure of history of collaborations of editors of each article. A set of features, rooted at sound theories of social behavior, are extracted from each network to train a classifier distinguishing controversial articles from other articles. To infer attitudes, a novel supervised approach is employed based on votes cast in Wikipedia admin elections. The combination of structural features extracted from each network, and the method for inferring attitudes of editors provides an accurate and efficient controversy model as demonstrated by several experiments and comparison with other methods. Also, a systematic evaluation and comparison of previous controversy models is provided. The results show the inefficiency of most of these models in capturing the complex process of formation of controversy, and express the power of editors collaboration networks for modeling this process. Finally, to give more insight about controversial topics, a novel framework is proposed to analyze controversy at a more fine-grained level. Using this framework, two different approaches are proposed. The first approach aims to separate the most controversial parts of each article from other non-controversial and reliable parts. This approach is shown to be a challenging problem due to both designing a suitable method and providing a quantitative evaluation. On other hand, the second approach helps to get a ranked list of the revisions that contributed most to controversy of the article. For this approach, a solution based on maximum coverage problem is proposed and its usefulness is shown by quantitative results and some case studies.
Subjects / Keywords
Graduation date

Spring 2016
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3CJ87R31
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Barbosa, Denilson (computing science)
Examining committee members and their departments
- Osmar, Zaiane (computing science )
- Schuurmans, Dale (computing science)
- Inkpen, Diana (computing science)
- Harms, Janelle (computing science)
- Barbosa, Denilson (computing science )