Uncovering Instabilities in Variational-Quantum Deep Q-Networks

Maja Franz, Lucas Wolf, Maniraman Periyasamy, Christian Ufrecht, Daniel D. Scherer, Axel Plinge, Christopher Mutschler, Wolfgang Mauerer

Deep Reinforcement Learning (RL) has considerably advanced over the past decade. At the same time, state-of-the-art RL algorithms require a large computational budget in terms of training time to converge. Recent work has started to approach this problem through the lens of quantum computing, which promises theoretical speed-ups for several traditionally hard tasks. In this work, we examine a class of hybrid quantum-classical RL algorithms that we collectively refer to as variational quantum deep Q-networks (VQ-DQN). We show that VQ-DQN approaches are subject to instabilities that cause the learned policy to diverge, study the extent to which this afflicts reproducibility of established results based on classical simulation, and perform systematic experiments to identify potential explanations for the observed instabilities. Additionally, and in contrast to most existing work on quantum reinforcement learning, we execute RL algorithms on an actual quantum processing unit (an IBM Quantum Device) and investigate differences in behavior between simulated and physical quantum systems that suffer from implementation deficiencies. Our experiments show that, contrary to opposite claims in the literature, it cannot be conclusively decided if known quantum approaches, even if simulated without physical imperfections, can provide an advantage as compared to classical approaches. Finally, we provide a robust, universal and well-tested implementation of VQ-DQN as a reproducible testbed for future experiments.

Quantum Policy Iteration via Amplitude Estimation and Grover Search -- Towards Quantum Advantage for Reinforcement Learning

Simon Wiedemann, Daniel Hein, Steffen Udluft, Christian Mendl

We present a full implementation and simulation of a novel quantum reinforcement learning (RL) method and mathematically prove a quantum advantage. Our approach shows in detail how to combine amplitude estimation and Grover search into a policy evaluation and improvement scheme. We first develop quantum policy evaluation (QPE) which is quadratically more efficient compared to an analogous classical Monte Carlo estimation and is based on a quantum mechanical realization of a finite Markov decision process (MDP). Building on QPE, we derive a quantum policy iteration that repeatedly improves an initial policy using Grover search until the optimum is reached. Finally, we present an implementation of our algorithm for a two-armed bandit MDP which we then simulate. The results confirm that QPE provides a quantum advantage in RL problems.

1-2-3 Reproducibility for Quantum Software Experiments

Wolfgang Mauerer, Stefanie Scherzinger

Various fields of science face a reproducibility crisis. For quantum software engineering as an emerging field, it is therefore imminent to focus on proper reproducibility engineering from the start. Yet the provision of reproduction packages is almost universally lacking. Actionable advice on how to build such packages is rare, particularly unfortunate in a field with many contributions from researchers with backgrounds outside computer science. In this article, we argue how to rectify this deficiency by proposing a 1-2-3~approach to reproducibility engineering for quantum software experiments: Using a meta-generation mechanism, we generate DOI-safe, long-term functioning and dependency-free reproduction packages. They are designed to satisfy the requirements of professional and learned societies solely on the basis of project-specific research artefacts (source code, measurement and configuration data), and require little temporal investment by researchers. Our scheme ascertains long-term traceability even when the quantum processor itself is no longer accessible. By drastically lowering the technical bar, we foster the proliferation of reproduction packages in quantum software experiments and ease the inclusion of non-CS researchers entering the field.

Peel | Pile? Cross-Framework Portability of Quantum Software

Manuel Schönberger, Maja Franz, Stefanie. Scherzinger, Wolfgang Mauerer

In recent years, various vendors have made quantum software frameworks available. Yet with vendor-specific frameworks, code portability seems at risk, especially in a field where hardware and software libraries have not yet reached a consolidated state, and even foundational aspects of the technologies are still in flux. Accordingly, the development of vendor-independent quantum programming languages and frameworks is often suggested. This follows the established architectural pattern of introducing additional levels of abstraction into software stacks, thereby piling on layers of abstraction. Yet software architecture also provides seemingly less abstract alternatives, namely to focus on hardware-specific formulations of problems that peel off unnecessary layers. In this article, we quantitatively and experimentally explore these strategic alternatives, and compare popular quantum frameworks from the software implementation perspective. We find that for several specific, yet generalisable problems, the mathematical formulation of the problem to be solved is not just sufficiently abstract and serves as precise description, but is likewise concrete enough to allow for deriving framework-specific implementations with little effort. Additionally, we argue, based on analysing dozens of existing quantum codes, that porting between frameworks is actually low-effort, since the quantum- and framework-specific portions are very manageable in terms of size, commonly in the order of mere hundreds of lines of code. Given the current state-of-the-art in quantum programming practice, this leads us to argue in favour of peeling off unnecessary abstraction levels.


Modelling and Solving Reinforcement Learning Problems on Quantum Computers

Simon Wiedemann

Master Thesis at the Technische Universität München 15.11.2021

In the thesis, a new approach to use Quantum Computing for Reinforcement Learning is developed. The procedure is analogous to policy iteration that is using policy evaluation and policy improvement. These are realized by the quantum subroutines of quantum phase estimation and Grover search.

To this end, we first describe a quantum MDP model and based on it develop a quantum policy iteration algorithm. Allowing quantum agent-environment interaction via the exchange of states, actions and rewards in superposition to reduce the (q)sample complexity of learning appears to be a promising way to obtain quantum advantage in RL.

An implementation for two rounds of two-armed bandit with discount.