Adversarial Attacks on the Visual Perception of Reinforcement Learning Agents

Convolutional neural networks are used to classify images. However, they are highly susceptible to adversarial attacks. These are targeted manipulations of the input data that lead to incorrect classifications. This thesis investigates how these attacks affect a system in which a Convolutional Neural Network is used for state recognition for a reinforcement learning agent. A tabular Q-learning agent and a DeepQ-learning network are used in the FrozenLake environment. It is shown that the decision-making ability of the agents is significantly impaired, especially at high attack rates. While the deep Q-learning network approach remains quite stable at an attack rate of 100% with a success rate of approximately 22%, the Q-learning agent is no longer able to navigate to the target at this attack rate.

A state tracking system, the state history, was developed to increase robustness. This checks the observed states for consistency in order to take corrective action in the event of incorrect classifications. Whether the state history can have a positive influence on the system depends heavily on the learned policy. For the Q-learning agent, the success rate under attack is improved by over 50% in some cases by the state history, but the success rate of the deep Q-learning network is worsened by the state history up to an attack rate of 70%. At even higher attack rates, however, the success rate can be slightly stabilized. The work also shows that Convolutional Neural Networks, which require little generalization, are difficult to manipulate with adversarial attacks. It was possible to find parameters for which the probability of success of an attack is 100%, but these attacks can clearly be perceived as manipulation by human observers.