Usage
  • 63 views
  • 65 downloads

On Coding Schemes for Straggler Mitigation in Distributed Computing

  • Author / Creator
    Fetrat Qharabagh, Muhammad
  • One of the main challenges in distributed computing are straggling servers i.e. servers that have temporary or permanent delay in sending the result of the computations assigned to them. One of the proposed methods to address the straggler problem is using forward error correction codes to encode the subtasks before distributing them among workers where the results of the lagging servers are treated as erasure. This method is referred to as coded distributed computing (CDC). In this thesis we study CDC in general and address two straggler related issues.

    First we consider a master-worker setting to perform a matrix-vector multiplication. The main focus in prior CDC approaches for this problem is on minimising the computation time using a family of codes known as minimum distance separable(MDS) codes. The assumption is that the encoding and the decoding time in the master is negligible. However, when the task is large or the number of workers is high, and the encoding/decoding can not be done offline, this time is not small compared to the computation time anymore. Hence, we introduce a new family of binary locally repairable codes (BLRCs) specifically designed for CDC. Being binary removes the costly multiplication operations in the encoding/decoding process and the locally repairable nature of the code further reduces the decoding complexity. Therefore, due to the lower encoding/decoding complexity in our codes, the overall time delay is lower than the conventional MDS scheme. Secondly, we consider a general master-worker setting where each worker is assigned with multiple subtasks. Accordingly, we propose a CDC scheme based on the systematic MDS codes. Then we derive and calculate the analytical probabilities of finishing the coded distributed computation before a deadline in our scheme and obtain the optimal subtask loads and the code rate for it. The overall goal is to reduce the computation time compared to conventional schemes for a multi-subtask per worker distributed computing.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-m1rv-tr83
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.