Usage
  • 220 views
  • 422 downloads

Predicting Textual Merge Conflicts

  • Author / Creator
    Moein Owhadi Kareshk
  • During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem.
    Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, i.e. safe merge scenarios. In this thesis, we assess if conflict prediction is feasible. We employed binary classifiers to predict merge conflicts based on 9 light-weight Git feature sets. We train and test predictors for each repository separately.
    To evaluate our predictors, we perform a large-scale study on 147,967 merges from 105 GitHub repositories in seven programming languages. Our results show that decision trees can achieve high f1-scores, varying from 0.93 to 0.95 for repositories in seven different programming languages when predicting safe merges. The f1-score is between 0.45 and 0.71 for the conflicting merges. Our results indicate that predicting conflicts is feasible, which suggests it may successfully be used as a pre- filtering criteria for speculative merging.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-pzb7-2y14
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.