Abstract
Residential electricity consumption datasets are essential for
applications such as smart grid management, home automation, renewable
energy integration, infrastructure planning, and policy-making. However,
obtaining high-resolution residential datasets remains challenging
due to the high costs and complexities of sensor installation, monitoring,
and maintenance, obtaining approvals and related human factors, among
other issues. To address this issue with a focus on the Indian residential
context, where such data is quite limited, we propose a Residential
Electricity Usage Simulator (REUS), to generate synthetic residential
electricity usage data. Our approach models electricity usage for 7 different
categories of homes using data collected over one year at an hourly
interval, from 65 residences. In addition to energy data, we also collected
18 different features for each home to improve our modeling. The
data and features are preprocessed using feature selection with Probabilistic
Finite State Machines (validated through Multiple Correspondence
Analysis) and further refined through systematic data cleaning
and imputation. We built simulation models using the popular Machine
learning techniques such as Long Short-Term Memory networks, including
Vanilla, Stacked, BiDirectional, and Encoder-Decoder LSTMs and
Transformer model, including Vanilla Transformer and Temporal Fusion
Transformer. In addition, statistical techniques such as Markov Chains
of orders (0, 1, 2, 3) and ARIMA were used as benchmarks to evaluate
the models’ ability to generate a synthetic residential electricity dataset
that is close to real data. Via extensive experimentation and analysis,
our results show that (Bi-di) LSTMs capture the trends in electricity
consumption more effectively (with the lowest RMSE) than the other
models. Simulation and analysis of this nature enables broader, regionspecific
energy research, reducing the need for costly or intrusive data
collection.