Adversarial Attacks, Robustness and Generalization in Deep Reinforcement Learning
By Ezgi Korkmaz, on 20 December 2023
The vulnerabilities of deep reinforcement learning policies against adversarial attacks have been demonstrated in prior studies [1,2,3,4]. However, a recent study takes these vulnerabilities one step further and introduces natural attacks (i.e. natural changes to the environment given that these changes are imperceptible) while providing a contradistinction between adversarial attacks and natural attacks. The instances of such changes include, but are not limited to creating a blur, introduction of compression artifacts, or perspective projection of the state observations at a level that humans cannot perceive the change.
Intriguingly, the results reported demonstrate that these natural attacks are at least equally, and often more imperceptible compared to adversarial attacks, while causing larger drop in policy performance. While these results carry significant concerns regarding artificial intelligence safety [5,6,7], they further raise questions on the model’s security. Note that the prior studies on adversarial attacks on deep reinforcement learning rely on the strong adversary assumption, in which the adversary has access to the policy’s perception system, training details of the policy (e.g. algorithm, neural network architecture, training dataset), and the ability to alter observations in real time with simultaneous modifications to the observation system of the policy with computationally demanding adversarial formulations. Thus, the fact that natural attacks described in [8] are black-box adversarial attacks, i.e. the adversary does not have access to the training details of the policy and the policy’s perception system to compute the adversarial perturbations, raises further questions on machine learning safety and responsible artificial intelligence.
Furthermore, the second part of the paper investigates the robustness of adversarially trained deep reinforcement learning policies (i.e. robust reinforcement learning) under natural attacks, and demonstrates that vanilla trained deep reinforcement learning policies are more robust than adversarially, i.e. robust, trained policies. While these results reveal further security concerns regarding the robust reinforcement learning algorithms, they further demonstrate that adversarially trained deep reinforcement learning policies cannot generalize at the same level as straightforward vanilla trained deep reinforcement learning algorithms.
This study overall, while providing a contradistinction between adversarial attacks and natural black-box attacks, further reveals the connection between generalization in reinforcement learning and the adversarial perspective.
Paper Abstract: https://ojs.aaai.org/index.php/AAAI/article/view/26009
Paper Link: https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781
Reading List on Adversarial Reinforcement Learning: https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning
[1] Adversarial Attacks on Neural Network Policies, ICLR 2017.
[2] Investigating Vulnerabilities of Deep Neural Policies. Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 2021.
[3] Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs. AAAI Conference on Artificial Intelligence, AAAI 2022. [Paper Link]
[4] Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions. International Conference on Machine Learning, ICML 2023. [Paper Link]
[5] New York Times. Global Leaders Warn A.I. Could Cause ‘Catastrophic’ Harm, November 2023.
[6] The Washington Post. 17 fatalities, 736 crashes: The shocking toll of Tesla’s Autopilot, June 2023.
[7] The Guardian. UK, US, EU and China sign declaration of AI’s ‘catastrophic’ danger, November 2023.
[8] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness, AAAI Conference on Artificial Intelligence, AAAI 2023. [Paper Link]