Free energy-based reinforcement learning (FERL) with clamped quantum Boltzmann machines (QBM) was shown to significantly improve the learning efficiency compared to classical Q-learning with the restriction, however, to discrete state-action space environments. We extended FERL to continuous state-action space environments by developing a hybrid actor-critic scheme combining a classical actor-network with a QBM-based critic. Results obtained with quantum annealing, both simulated and with D-Wave quantum annealing hardware, are discussed, and the performance is compared to classical reinforcement learning methods. The method is applied to a variety of particle accelerator environments among which is the actual electron beam line of the Advanced Plasma Wakefield Experiment (AWAKE) at CERN.
|Consider for long presentation