AI risks | UCL UCL Department of Science, Technology, Engineering and Public Policy

Adversarial Attacks, Robustness and Generalization in Deep Reinforcement Learning

By Ezgi Korkmaz, on 20 December 2023

Reinforcement learning has achieved substantial progress on successfully completing tasks, from solving complex games to large language models (i.e. GPT-4) including many different fields from medical applications to self-driving vehicles and finance, by learning from raw high-dimensional data with the utilization of deep neural networks as function approximators.

The vulnerabilities of deep reinforcement learning policies against adversarial attacks have been demonstrated in prior studies [1,2,3,4]. However, a recent study takes these vulnerabilities one step further and introduces natural attacks (i.e. natural changes to the environment given that these changes are imperceptible) while providing a contradistinction between adversarial attacks and natural attacks. The instances of such changes include, but are not limited to creating a blur, introduction of compression artifacts, or perspective projection of the state observations at a level that humans cannot perceive the change.

Intriguingly, the results reported demonstrate that these natural attacks are at least equally, and often more imperceptible compared to adversarial attacks, while causing larger drop in policy performance. While these results carry significant concerns regarding artificial intelligence safety [5,6,7], they further raise questions on the model’s security. Note that the prior studies on adversarial attacks on deep reinforcement learning rely on the strong adversary assumption, in which the adversary has access to the policy’s perception system, training details of the policy (e.g. algorithm, neural network architecture, training dataset), and the ability to alter observations in real time with simultaneous modifications to the observation system of the policy with computationally demanding adversarial formulations. Thus, the fact that natural attacks described in [8] are black-box adversarial attacks, i.e. the adversary does not have access to the training details of the policy and the policy’s perception system to compute the adversarial perturbations, raises further questions on machine learning safety and responsible artificial intelligence.

Furthermore, the second part of the paper investigates the robustness of adversarially trained deep reinforcement learning policies (i.e. robust reinforcement learning) under natural attacks, and demonstrates that vanilla trained deep reinforcement learning policies are more robust than adversarially trained policies. While these results reveal further security concerns regarding the robust reinforcement learning algorithms, they further demonstrate that adversarially trained deep reinforcement learning policies cannot generalize at the same level as straightforward vanilla trained deep reinforcement learning algorithms.

This study overall, while providing a contradistinction between adversarial attacks and natural black-box attacks, further reveals the connection between generalization in reinforcement learning and the adversarial perspective.

Author’s Note: This blog post is based on the paper ‘Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness’ published in AAAI 2023.

Author Website: https://ezgikorkmaz.github.io/

Twitter: https://twitter.com/EzgiKorkmazAI
Paper Abstract: https://ojs.aaai.org/index.php/AAAI/article/view/26009
Paper Link: https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781
Reading List on Adversarial Reinforcement Learning: https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning

References:
[1] Adversarial Attacks on Neural Network Policies, ICLR 2017.
[2] Investigating Vulnerabilities of Deep Neural Policies. Conference on Uncertainty in Artificial Intelligence (UAI).
[3] Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs. AAAI Conference on Artificial Intelligence, 2022. [Paper Link]
[4] Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions. International Conference on Machine Learning, ICML 2023. [Paper Link]
[5] New York Times. Global Leaders Warn A.I. Could Cause ‘Catastrophic’ Harm, November 2023.
[6] The Washington Post. 17 fatalities, 736 crashes: The shocking toll of Tesla’s Autopilot, June 2023.
[7] The Guardian. UK, US, EU and China sign declaration of AI’s ‘catastrophic’ danger, November 2023.
[8] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness, AAAI Conference on Artificial Intelligence, 2023. [Paper Link]

Filed under 21st Century Decision Making, Digital Technology and Policy Laboratory, Education, Machine Learning and Research, Master's of Public Adminstration, Policy Impact Unit, Public Policy Processes and Knowledge Systems

Tags: AI alignment, AI risks, AI safety, Artificial Intelligence, Machine Learning, Reinforcement learning, Responsible AI, Trustworthy AI

No Comments »

Adversarial Attacks, Robustness and Generalization in Deep Reinforcement Learning

About

Categories

Bookmarks