Investigating the Quality of Bindings for Machine Learning Libraries in Software Package Ecosystems

Li, Hao

doi:doi:10.7939/r3-9tbr-a377

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

93 views
163 downloads

Investigating the Quality of Bindings for Machine Learning Libraries in Software Package Ecosystems

Author / Creator

Li, Hao
Machine learning (ML) has revolutionized many domains, with developers often relying on open source ML libraries to integrate ML capabilities into their projects. However, these libraries primarily support a single programming language, limiting their availability for projects in other languages. Bindings serve as bridges between programming languages by providing interfaces to ML libraries. This thesis investigates the quality of bindings for ML libraries in software package ecosystems, focusing on their maintenance and software quality.

The first study presented in this thesis introduces BindFind, an automated approach to identify bindings and link them with their corresponding host libraries across various software package ecosystems. By analyzing 2,436 bindings for 546 ML libraries, we find that most bindings are community-maintained, with npm being the most popular choice for publishing these bindings. The analysis reveals that these bindings usually cover a limited range of releases from their host library and experience significant delays in supporting new releases.

In the second study, we investigate the usage and rationale behind release-level deprecation in bindings for ML libraries within the Cargo and npm ecosystems. We discover that bindings in Cargo have a higher percentage of deprecated releases compared to general packages, while the percentages of deprecated releases and general packages are similar in npm. The primary reasons for deprecation are package removal or replacement and defects in both ecosystems. We also identify the issue of implicitly deprecated releases in Cargo due to deprecation propagation through the dependency network.

The third study evaluates the impact of using different bindings on the software quality of ML systems through experiments on model training and inference using TensorFlow and PyTorch across four programming languages. The results show that models trained with one binding perform consistently in inference tasks when utilized with another binding. Furthermore, non-default bindings can outperform the default Python bindings in specific tasks without sacrificing accuracy. We also find significant differences in inference times across bindings, highlighting the benefits of choosing appropriate bindings based on specific performance requirements to maximize efficiency in ML projects.

The work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches for assessing and improving the quality of bindings for ML libraries in software package ecosystems.
Subjects / Keywords
Graduation date

Fall 2024
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-9tbr-a377
License

This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Electrical and Computer Engineering
Specialization
- Software Engineering and Intelligent Systems
Supervisor / co-supervisor and their department(s)
- Bezemer, Cor-Paul (Electrical and Computer Engineering)