Download the full-sized PDF of Extracting Structured Knowledge from Textual Data in Software RepositoriesDownload the full-sized PDF



Permanent link (DOI):


Export to: EndNote  |  Zotero  |  Mendeley


This file is in the following communities:

Graduate Studies and Research, Faculty of


This file is in the following collections:

Theses and Dissertations

Extracting Structured Knowledge from Textual Data in Software Repositories Open Access


Other title
Mining Software Repositories, Textual Data, Text Mining, Knowledge Extraction
Type of item
Degree grantor
University of Alberta
Author or creator
Hasan, Maryam
Supervisor and department
Stroulia, Eleni (Computing Science)
Barbosa, Denilson (Computing Science)
Examining committee member and department
Reformat, Marek (Electrical and Computer Engineering)
Wong, Ken (Computing Science)
Computing Science

Date accepted
Graduation date
Master of Science
Degree level
Software team members, as they communicate and coordinate their work with others throughout the life-cycle of their projects, generate different kinds of textual artifacts. Despite the variety of works in the area of mining software artifacts, relatively little research has focused on communication artifacts. Software communication artifacts, in addition to source code artifacts, contain useful semantic information that is not fully explored by existing approaches. This thesis, presents the development of a text analysis method and tool to extract and represent useful pieces of information from a wide range of textual data sources associated with software projects. Our text analysis system integrates Natural Language Processing techniques and statistical text analysis methods, with software domain knowledge. The extracted information is represented as RDF-style triples which constitute interesting relations between developers and software products. We applied the developed system to analyze five different textual information, i.e., source code commits, bug reports, email messages, chat logs, and wiki pages. In the evaluation of our system, we found its precision to be 82%, its recall 58%, and its F-measure 68%.
License granted by Maryam Hasan ( on 2011-01-27T19:42:44Z (GMT): Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of the above terms. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
Citation for previous publication

File Details

Date Uploaded
Date Modified
Audit Status
Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 1316990
Last modified: 2015:10:12 10:40:48-06:00
Filename: Hasan_Maryam_Spring 2011.pdf
Original checksum: 69190d2b3c2e7f392ed93226fd70f186
Well formed: false
Valid: false
Status message: Lexical error offset=1302073
Page count: 72
Activity of users you follow
User Activity Date