- 47 views
- 94 downloads
Investigating the Quality of Bindings for Machine Learning Libraries in Software Package Ecosystems
-
- Author / Creator
- Li, Hao
-
Machine learning (ML) has revolutionized many domains, with developers often relying on open source ML libraries to integrate ML capabilities into their projects. However, these libraries primarily support a single programming language, limiting their availability for projects in other languages. Bindings serve as bridges between programming languages by providing interfaces to ML libraries. This thesis investigates the quality of bindings for ML libraries in software package ecosystems, focusing on their maintenance and software quality.
The first study presented in this thesis introduces BindFind, an automated approach to identify bindings and link them with their corresponding host libraries across various software package ecosystems. By analyzing 2,436 bindings for 546 ML libraries, we find that most bindings are community-maintained, with npm being the most popular choice for publishing these bindings. The analysis reveals that these bindings usually cover a limited range of releases from their host library and experience significant delays in supporting new releases.
In the second study, we investigate the usage and rationale behind release-level deprecation in bindings for ML libraries within the Cargo and npm ecosystems. We discover that bindings in Cargo have a higher percentage of deprecated releases compared to general packages, while the percentages of deprecated releases and general packages are similar in npm. The primary reasons for deprecation are package removal or replacement and defects in both ecosystems. We also identify the issue of implicitly deprecated releases in Cargo due to deprecation propagation through the dependency network.
The third study evaluates the impact of using different bindings on the software quality of ML systems through experiments on model training and inference using TensorFlow and PyTorch across four programming languages. The results show that models trained with one binding perform consistently in inference tasks when utilized with another binding. Furthermore, non-default bindings can outperform the default Python bindings in specific tasks without sacrificing accuracy. We also find significant differences in inference times across bindings, highlighting the benefits of choosing appropriate bindings based on specific performance requirements to maximize efficiency in ML projects.
The work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches for assessing and improving the quality of bindings for ML libraries in software package ecosystems.
-
- Subjects / Keywords
-
- Graduation date
- Fall 2024
-
- Type of Item
- Thesis
-
- Degree
- Doctor of Philosophy
-
- License
- This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.