Abstract
Developing a reinforcement learning (RL) framework that works satisfactorily in deterministic and stochastic environments is challenging. To address this problem, a twin agent RL framework is proposed in this work, wherein we amalgamate both stochastic and deterministic agents’ actions in a multiagent framework that works with a feedback mechanism that actively monitors the output. The proposed algorithm uses twin actor networks of different agents, corresponding to deterministic and stochastic agents, and an action selection critic network is used to choose the best action from both agents. Here, the algorithm blends the outcomes of two reinforcement learning (RL) agents, a stochastic agent and a deterministic agent, namely, Proximal Policy Optimization (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3), respectively. We assess the effectiveness of the proposed algorithm by applying it to two case studies: (i) monoclonal antibody (mAb) production and (ii) production of propylene glycol (PG). The studies are conducted in the presence of parametric uncertainties, measurement noise, and nominal conditions. It is observed that for case study 1, the root-mean-square error (RMSE) value for the proposed algorithm is reduced by 40.9% when compared with TD3 and 27.57% when compared with PPO for the simulations. Similarly, for case study 2, the RMSE for the proposed algorithm is reduced by 8.87% when compared with TD3 and 5.8% with PPO. Based on extensive simulations, it is found that the proposed twin agent algorithm has faster convergence and better set-point tracking when compared to the agents operated individually.