Enhancing Performance and Energy Optimization in Serverless Computing
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Serverless computing has been recognized as a transformative paradigm within cloud computing, offering Function-as-a-Service (FaaS) capabilities that allow developers to deploy applications without managing underlying infrastructure. Despite its advantages in scalability and cost-effectiveness, serverless computing still faces significant challenges related to workload unpredictability, inefficient resource utilization, energy consumption, and a lack of intelligent performance modeling. These issues are especially critical in serverless environments that demand dynamic autoscaling and precise workload management.
This thesis presents a comprehensive study of performance modeling and energy optimization in serverless systems, focusing on autoscaling mechanisms based on learning-driven approaches. Initially, a detailed literature review has been conducted to investigate the performance metrics in serverless computing—such as response time, cost, energy consumption, cold start frequency, resource utilization, and fault tolerance—and to assess the limitations of existing autoscaling strategies. The findings emphasize the need for intelligent, adaptive autoscaling models to efficiently manage fluctuating workloads to enhance Quality of Service (QoS) adherence.
The conventional approaches often fail to adapt effectively to sudden workload variations and lack the ability to learn from past performance data, which motivated the design of a more adaptive, learning-based autoscaling mechanism. Several models have been proposed and systematically evaluated throughout this research to address these concerns. Firstly, an auto-scalable model based on Q-learning has been introduced, enabling dynamic adjustment of compute resources in response to varying workload intensities. This model proves helpful in maximizing resource utilization by automatically scaling resources up or down as needed. The model continuously monitors incoming request rates and the current state of function instances, selecting scaling actions based on learned policies derived from historical performance data. The effectiveness of this model has been demonstrated on AWS Lambda, showing improvements in key metrics, including average response time reduced by 35.62\%, the mean number of idle instances minimized by 3.37\%, the probability of cold starts decreased by 38.5\%, and energy consumption lowered by 46.15\%.
While the Q-learning–based autoscalable model improved performance and energy consumption, its single-agent nature limited scalability and hindered coordinated decision-making across distributed instances. To overcome this, a Multi-Agent Deep Q-Learning (MADQL) model has been proposed to overcome the limitations of single-agent methods by enabling cooperative learning among agents. This model effectively mitigates issues of overutilization and underutilization by allowing agents to make real-time scaling decisions. Through extensive experimentation on a real-world e-commerce dataset using AWS Lambda, significant improvements in metrics have been revealed, with average response time reduced by 0.96\%, cost lowered by 1.46\%, energy consumption minimized by 2.43\%, throughput increased by 0.44\%, and CPU utilization improved by 15.79\% compared with the existing model.
Although MADQL provided cooperative learning and better workload distribution, it lacked predictive capabilities to anticipate workload surges, leading to reactive rather than proactive scaling. Building upon this, a hybrid learning model, LMP-Opt, has been introduced that integrates Long Short-Term Memory (LSTM) for workload prediction, Multi-Agent Deep Q-Learning (MADQL) for resource autoscaling, and Proximal Policy Optimization (PPO) for optimizing energy consumption through fine-tuning policy decisions. The LSTM component captures temporal workload patterns to facilitate predictive autoscaling. At the same time, MADQL dynamically allocates jobs by scaling resources up or down in response to workload fluctuations, and PPO has been introduced to refine these discrete actions into continuous ones, optimizing energy consumption and enhancing convergence.
The proposed model has been further validated on AWS Lambda and ServerlessSimPro using dynamic e-commerce workloads, demonstrating improvements of up to 6.09\% in response time, 6.14\% in energy consumption, and 7.82\% in cost, while improving CPU utilization by 4.93\% and reducing the required number of nodes by 5.59\%.