Abstract
This paper introduces a two-level optimization strategy for smart energy communities, designed to optimize energy distribution, reduce reliance on external grids, and lower operational costs. Our approach combines Multiagent Reinforcement Learning (MARL) with Distributed model Predictive Control (DMPC), leveraging peer-to-peer (P2P) energy trading to achieve these goals. Each participant in the energy community is modeled as an aggregator reinforcement learning (RL) agent, with a local DMPC serving as a Local Energy Scheduler (LES) to optimize battery and load dispatching. Our novel MARL-MPC framework employs the Soft Actor-Critic (SAC) algorithm to enhance the efficiency of RL decisions. We evaluate this approach against traditional Deep Reinforcement Learning (DRL) and Proximal Policy Optimization (PPO)-based methods using simulations based on historical household data. The results show that our method significantly reduces grid dependency and costs while improving learning convergence through coordinated P2P energy sharing, outperforming alternative methods.