Usage
  • 290 views
  • 372 downloads

Applications of the Naturalness of Software

  • Author / Creator
    Santos, Eddie
  • There is a wealth of software development artifacts such as source code, issue reports, and revision histories, contained within publicly-accessible and privately-accessible repositories. Mining this data presents myriad opportunities that may benefit future software development efforts; however it is unclear exactly how to leverage this data. This thesis explores the naturalness of software—the assertion that source code, much like natural languages, is regular and predictable. This enables the application of techniques borrowed from the fields of natural language processing (NLP) and information retrieval (IR) to gain insight from existing software repositories. This thesis demonstrate different methods of mapping software artifacts to natural language models and full-text databases. Then, we show how to use these models and databases in the tasks of predicting whether a commit may break the build; clustering crash reports in a scalable and time-efficient manner; and detecting and correcting syntax errors in code written by novices. This thesis empirically demonstrate the effectiveness of these tools on real world software repositories. I conclude by suggesting ways of further exploiting the data contained within software repositories.

  • Subjects / Keywords
  • Graduation date
    Fall 2018
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/R3W37MB7X
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.