Usage
  • 121 views
  • 113 downloads

{Multi-Agent Deep Reinforcement Learning for Autonomous Energy Coordination in Demand Response Methods for Residential Distribution Networks

  • Author / Creator
    Atrazhev, Peter
  • In the field of collaborative learning and decision-making, this thesis aims to explore the effects of individual and joint rewards on the performance and coordination of agents in complex environments. The research objectives encompass two main aspects: firstly, to determine the objective superiority of joint rewards over individual rewards in centralized learning decentralized execution (CLDE) algorithms; secondly, to apply CLDE algorithms, both with individual and joint rewards, to evaluate the potential improvement in coordination for district energy management through centralized learning systems.

    To achieve these objectives, an empirical analysis was conducted by varying the reward function between individual and joint rewards, as well as adjusting the episode length in the range of 25 to 50, using a selection of CLDE algorithms in the Level Based Foraging (LBF) environment. Subsequently, the same experimental framework was applied to the CityLearn challenge 2022.

    The results of the first study reveal that different CLDE algorithms respond unpredictably when transitioning from joint to individual rewards in the LBF
    environment. Specifically, multi-agent proximal policy optimization (MAPPO) and QMIX demonstrate an ability to leverage the additional variance present in individual rewards, resulting in improved policies. Conversely, value decomposition networks (VDN) and multi-agent synchronous advantage actor critic (MAA2C) experience performance degradation due to increased variance. Notably, it was observed that centralized critic algorithms require a delicate balance, wherein the critic converges slowly enough to find optimal joint policies without being excessively sensitive to variance increases. Furthermore, value decomposition methods exhibit a need for additional state information to effectively condition agent coordination for optimal policy learning. These findings indicate that the choice of reward function holds significant importance in Multi-Agent Reinforcement Learning (MARL) environments, potentially influencing the emergence of desired behavior.

    The results of the second study shed light on the role and effectiveness of various CLDE algorithms and reward structures within the context of the CityLearn task. Comparisons between MAA2C, independent proximal policy optimization (IPPO),independent synchronous advantage actor critic (IA2C), and MAPPO, under individual and joint rewards, reveal substantial impacts on algorithm performance and efficiency. Specifically, MAA2C with individual rewards emerges as the most effective algorithm across multiple key performance indicators (KPIs), surpassing its competitors in peak demand and district ramping, and outperforming the random agent in all district KPIs. While IPPO and IA2C demonstrate strengths under individual rewards, they exhibit deficiencies compared to the random agent in certain areas. Conversely, MAPPO performs better with joint rewards, underscoring the nuanced differences between algorithms and the contextual conditions in which they excel.

    This work highlights the significance of reconsidering joint rewards as the default choice for collaborative tasks. The findings suggest that individual rewards can be effectively employed in collaborative settings, with the choice between individual and joint rewards potentially presenting a bias-variance tradeoff. Further research is necessary to fully ascertain the implications of these results and refine the understanding of reward structures in collaborative learning and decision-making environments.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-49x3-b275
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.