“Safe Trajectory Sampling in Model-based Reinforcement Learning for Robotic Systems” By Sicelukwanda Zwane
By sharon.betts, on 29 September 2023
In the exciting realm of Model-based Reinforcement Learning (MBRL), researchers are constantly pushing the boundaries of what robots can learn to achieve when given access to an internal model of the environment. One key challenge in this field is ensuring that robots can perform tasks safely and reliably, especially in situations where they lack prior data or knowledge about the environment. That’s where the work of Sicelukwanda Zwane comes into play.
Background
In MBRL, robots use small sets of data to learn a dynamics model. This model is like a crystal ball that predicts how the system will respond to a given sequence of different actions. With MBRL, we can train policies from simulated trajectories sampled from the dynamics model instead of first generating them by executing each action on the actual system, a process that can take extremely long periods of time on a physical robot and possibly cause wear and tear.
One of the tools often used in MBRL is the Gaussian process (GP) dynamics model. GPs are fully-Bayesian models that not only model the system but also account for the uncertainty in state observations. Additionally, they are flexible and are able to learn without making strong assumptions about the underlying system dynamics [1].
The Challenge of Learning Safely
When we train robots to perform tasks, it’s not enough to just predict what will happen; we need to do it safely. As with most model classes in MBRL, GPs don’t naturally incorporate safety constraints. This means that they may produce unsafe or unfeasible trajectories. This is particularly true during early stages of learning, when the model hasn’t seen much data, it can produce unsafe and seemingly random trajectories.
For a 7 degree of freedom (DOF) manipulator robot, bad trajectories may contain self-collisions.
Distributional Trajectory Sampling
In standard GP dynamics models, the posterior is represented in distributional form – using its parameters, the mean vector and covariance matrix. In this form, it is difficult to reason about
about the safety of entire trajectories. This is because trajectories are generated through iterative random sampling. Furthermore, this kind of trajectory sampling is limited to cases where the intermediate state marginal distributions are Gaussian distributed.
Pathwise Trajectory Sampling
Zwane uses an innovative alternative called “pathwise sampling” [3]. This approach draws samples from GP posteriors using an efficient method called Matheron’s rule. The result is a set of smooth, deterministic trajectories that aren’t confined to Gaussian distributions and are temporally correlated.
Adding Safety
The beauty of pathwise sampling [3] is that it has a particle representation of the GP posterior, where individual trajectories are smooth, differentiable, and deterministic functions. This allows for the isolation of constraint-violating trajectories from safe ones. For safety, rejection sampling is performed on trajectories that violate safety constraints, leaving behind only the safe ones to train the policy. Additionally, soft constraint penalty terms are added to the reward function.
Sim-Real Robot Experiments
To put this approach to the test, Zwane conducted experiments involving a 7-DoF robot arm in a simulated constrained reaching task, where the robot has to avoid colliding with a low ceiling. The method successfully learned a reaching policy that adhered to safety constraints, even when starting from random initial states.
In this constrained manipulation task, the robot is able to reach the goal (shown by the red sphere – bottom row) without colliding with the ceiling (blue – bottom row) using less than 100 seconds of data in simulation.
Summary
Sicelukwanda Zwane’s research makes incremental advances on the safety of simulated trajectories by incorporating safety constraints while keeping the benefits of using fully-Bayesian dynamics models such as GPs. This method promises to take MBRL out of simulated environments and make it more applicable to real-world settings. If you’re interested in this work, we invite you to dive into the full paper, published at the recent IEEE CASE 2023 conference.
References
- M. P. Deisenroth and C. E. Rasmussen. PILCO: A Model-based and Data-efficient Approach to Policy Search. ICML, 2011.
- S. Kamthe and M. P. Deisenroth. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control. AISTATS, 2018.
- J. T. Wilson, V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth. Pathwise Conditioning of Gaussian Processes. JMLR, 2021.
Student-Led Workshop – Distance-based Methods in Machine Learning – Review by Masha Naslidnyk
By sharon.betts, on 3 July 2023
We are delighted to announce the successful conclusion of our recent workshop on Distance-based Methods in Machine Learning. Held at the historical Bentham House on 27-28th of June, the event brought together approximately 60 delegates, including leading experts and researchers from statistics and machine learning. The workshop showcased a diverse range of speakers who shared their knowledge and insights on the theory and methodology behind machine learning approaches utilising kernel-based and Wasserstein distances. Topics covered included parameter estimation, generalised Bayes, hypothesis testing, optimal transport, optimization, and more. The interactive sessions and engaging discussions created a vibrant learning environment, fostering networking opportunities and collaborations among participants. We extend our gratitude to the organising committee, speakers, and attendees for their valuable contributions to this successful event. Stay tuned for future updates on similar initiatives as we continue to explore the exciting possibilities offered by distance-based methods in machine learning.
AI Hackathon at Cumberland Lodge – Recap of Student Led Event
By sharon.betts, on 2 June 2023
We recently organised an AI hackathon, attended by both the members of our CDT and students from AI-focused CDTs at other universities. The hackathon was the main component of a two-night retreat hosted at Cumberland Lodge, a country house and conference venue in the beautiful Windsor Great Park. The event was student-led, and an exciting opportunity to explore new research directions, brainstorm start-up ideas, and build connections with other PhD students in the field.
During the hackathon we split into small groups, each working on their own projects which had been proposed in advance by the attendees. Lots of ambitious projects were suggested, and it was impressive to see them carried out successfully. These included a web app for language learners that uses speech recognition to judge and correct Mandarin tone pronunciation; an investigation into the capabilities of large language models for solving cryptic crosswords, culminating in a thrilling live demo; and mapping out gaps in the market for waste manipulation robotics start-up.
Most excitingly, a couple of the teams have decided to continue developing their projects after the event, with new apps and conference papers in the works!
In addition to the hackathon, the students attending from outside of the CDT in Foundational AI presented their PhD research during a poster session. G-Research also attended the retreat, kindly providing welcome drinks on the first night, and hosting a prize giving for their research competition. There were also ample opportunities for socialising over meals and in the bar, and exploring the sunny surroundings of the park.
Thank you to the CDT management for helping with organising the event, and all the attendees for making it a success. We hope to arrange something similar next year!
Authors
Oscar Key and Robert Kirk
Day Dream Believing? Thinking about World Models. By Rokas Bendikas
By sharon.betts, on 23 May 2023
I am interested in discussing an intriguing concept in machine learning, which promises to revolutionize the way we approach learning in robotics: World Models.
At a high level, World Models aim to create a compact and controllable representation of the world. Think of it as a mental simulation or an internal mini-world where AI can experiment, explore, and ‘imagine’ different scenarios, all without the need for real-world interactions. It’s like creating a sandbox game for AI, where it can learn the ropes before stepping out into the real world. ??
Let’s contrast this with the conventional end-to-end learning methods. These traditional approaches typically require vast amounts of real-world data and intensive training, which can be time-consuming, computationally expensive, and let’s face it, data-inefficient.
That’s where the beauty of World Models shines. By allowing AI to ‘dream’ or simulate possible scenarios in their internal model of the world, they can learn faster and more efficiently. They can plan and strategize better by running various ‘what-if’ scenarios within their world model. Imagine playing chess and being able to simulate all possible moves in your mind before making your move – that’s the advantage of World Models in a nutshell! ??
The ‘DayDreamer’ paper is a fantastic resource if you’re keen to delve into the specifics of this innovative approach. It opens up new vistas in our quest for smarter and more data-efficient learning in robotics.
In a world where data is king but also a constraint, World Models are pioneering a path towards more strategic, efficient, and thoughtful AI. So, let’s continue learning, exploring, and innovating. After all, the future of AI is as exciting as we dare to imagine!
Student success in G Research PhD Competition
By sharon.betts, on 16 May 2023
G Research, an industry partner and supporter of our CDT, recently ran a competition for our PhD students, summarising their PhD research field and sharing how their work is both novel and has an impact in its field.
At G-Research we value supporting talent and innovation at institutions world-wide. We were thrilled to sponsor the UCL CDT in Foundational AI PhD prize. The video submission format provided the opportunity for PhD students to present their research in a concise way. We are grateful of the chance we were given to join in on talks and the poster session at Cumberland Lodge. We look forward to repeating the experience next year and meet more of the staff and students. – Dr Charles Martinez, Academic Relations Manager
There were three prize winners overall
1st Jakob Zeitler
2nd Augustine Mavor-Parker
3rd Jake Cunningham
The standard of submission was excellent. All videos were very high quality, interesting and informative and we’re very proud of our students.
The prize is important since it has encouraged students to focus on learning the skills to communicate scientific ideas well and to a broader audience. The entries were also quite creative and we hope that all entrants found this a useful and enjoyable process.
Below is the winning entry video from Jakob Zeitler
We are delighted to that our students are excelling in their field and providing new ideas for the future of AI research.
The CDT wishes to thank G Research for its support.
Presenting to UltraLeap, Reflections on giving talks to Industry Partners by Zak Morgan
By sharon.betts, on 18 January 2023
During an EU Research project meeting, I was invited by one of the industry partners (UltraLeap) to give a talk at their company regarding my PhD project so far. Since it is my first talk aimed at industry specifically I was unsure on what to expect and so here I’ll lay out some of my reflections on the process with the aim of helping those who find themselves in my shoes in the future.
Like in my previous blog post, I’d emphasize that the content you prepare, whether it be a slide deck, poster or other materials, is only the spring board for discussion. Whilst my slide-deck for the formal presentation was 10 slides, I ended up showing many other results and work to aid answering questions in the Q&A afterwards. I’d highly recommend having a collection of supplementary material, like in an academic paper, to aid in answering any questions that might arise from the discussion.
Overall, I found the talk incredibly productive, and I’d like to think both me and the employees found the talk helpful. In particular I can share findings that I am able to carry out due to the nature of my work not being profit-focused, whilst they can provide engineering details and polish on thoughts, that are simply not valued in a research context.
I’d also advise you talk to your supervisors beforehand and the company to make sure you know what they want you to talk about, and what you can talk about. Sometimes this may require signing an NDA if you want to see particularly cool stuff that goes on in industry and not doing so will result in you missing out on a lot of the cool inner-workings at these industrial partners!
On a final note about preparation, if you find yourself in this position, although the talk may only have half an hour, or an hour scheduled, make sure you’re free for a much longer period. A talk over lunch before or with certain people afterwards can be just as productive as the meeting itself!
A massive thanks to UltraLeap for the opportunity, and I’d highly recommend other PhD students to make sure they’re utilising the great connections you can make during your research.
This work was supported by the Royal Academy of Engineering Chairs in Emerging Technology Scheme (CiET1718/14)
CDT Collaboration – Inter CDT Conference at Bristol Hotel with ART-AI and Interactive AI CDTS 7-8 Nov 2022
By sharon.betts, on 29 November 2022
On 7th and 8th November 2022 three of the UKRI CDTs in Artificial Intelligence hosted an Inter-CDT conference for our students and industry partners at The Bristol Hotel. The UKRI CDT in Foundational AI worked alongside our sister CDTs at the University of Bath (ART-AI) and University of Bristol (Interactive AI), to produce a two day event that covered AI from deep tech entrepreneurship to AI Ethics and Defence.
Turnout from all three CDTs was excellent and it was a wonderful opportunity for students across the three institutions to meet and collaborate with one another, sharing their knowledge and research of AI both in theory and applied.
UCL were delighted to host two panel sessions; the first being on Deep Tech entrepreneurship with Dr. Riam Kanso from Conception X, Dr. Stacy-Ann Sinclair from CodeREG and Dr. Thomas Stone from Kintsugi (ad)Ventures. Hosted by our CDT Director, Prof David Barber, this interactive panel session saw our specialists discuss the pathways into start ups and entrepreneurships, the perils, pitfalls and positives that follow! It was wonderful to be able to hear from industry experts their personal journeys to successful business ventures and great to have such an engaged and enquiring audience, who were keen to ask numerous questions and gain further insight to future possibilities.
Our second panel closed the event and was a student-led initiative discussing large scale datasets and massive computational modelling in AI.
For a more detailed review of the event we highly recommend you read the review by ART-AI on their website.
We were delighted to celebrate our student Dennis Hadjivelichkov’s second place in the poster session that took place at the MShed in Bristol as well as enjoy the fine food and fabulous company of our CDT peers.
With thanks to ART-AI and Interactive AI CDTs for their co-hosting and co-organising skills. It was a delight to be able to share time and work with our sister CDTs and we hope to collaborate again in the not too distant future.
Conferences and Workshops – GOFCP, MLF & EDS 2022 – Recap of events by Antonin Schrab
By sharon.betts, on 16 November 2022
In September 2022 I had the amazing opportunity to participate in workshops in Rennes and in Sophia Antipolis, and in a doctoral symposium in Alicante!
In poster sessions and talks, I have presented my work on Aggregated Kernel Tests which covers three of my papers. The first one is MMD Aggregated Two-Sample Test where the two-sample problem is considered, in which one has access to samples from two distributions and is interested in detecting whether those come from the same or from different distributions. The second is KSD Aggregated Goodness-of-fit Test in which we consider the goodness-of-fit problem where one is given some samples and is asked whether these come from a given model (with access to its density or score function). In the third one, Efficient Aggregated Kernel Tests using Incomplete U-statistics, we propose computationally efficient tests for the two-sample, goodness-of-fit, and independence problems; this last one consists in detecting dependence between the two components of paired samples. We tackle these three testing problems using kernel-based statistics, in such a setting the performance of these tests is known to heavily depend on the choice of kernels or kernel parameters (i.e. bandwidth parameter). We propose tests which aggregate over a collection of kernels and retain test power, we theoretically prove optimality of our tests under some regularity assumptions, and empirically show that our aggregated tests outperform other state-of-the-art kernel-based tests.
I am extremely grateful to Valentin Patilea, Motonobu Kanagawa and Aditya Gulati for the respective invitations, and to my CDT (UCL CDT in Foundational AI with funding from UKRI) which allowed me to participate in those workshops/symposium!
CDT Students shine at poster showcase event
By sharon.betts, on 4 November 2022
Tuesday 1st November was a busy day at the CDT and UCL Centre for Artificial Intelligence with our joint UKRI CDT poster showcase and AI demo event. Together with the UKRI CDT in AI-Enabled Healthcare we put on an event featuring posters, demos, AI art and robots.
The afternoon began with presentations by the CDT centre directors Prof David Barber and Prof Paul Taylor, as well as our industry sponsor Ulrich Paquet from Deepmind. In attendance were students, academics and industry partners, keen to understand what we have been doing and where our research will take us in the future.
We had approximately 40 posters on display, with a further 19 demonstrations of AI by a variety of groups from Vision to Natural Language Processing. Engagement with the poster presenters was high across the board and a wonderful opportunity for our students to engage with others about the work that they have undertaken the last few years.
We were honoured to have the Provost in attendance to witness just how vibrant and stimulating our centres are as part of a dynamic and successful Computer Science department.
The UCL Centre of Artificial Intelligence have been donated a rare 3D generated AI generated painting of a Amedeo Modigliani, which started as a Masters and then PhD project for Dr. Anthony Bouchard and Dr. George Cann and will be displayed at the AI Centre for all to see.
The day ended with a robot display in the Function Space, showcasing the quadrapod robots that our students are working with both at the AI Centre and the soon to be opened UCL East.
It was wonderful to witness all the different ways in which AI is being applied and developed to help solve some of societies greatest needs and to have the opportunity to share the work of our students with a wider audience.
With thanks to those who attended, our students, director David Barber, AI Centre manager Sarah Bentley and the TSG team for their time, patience and support in helping to make this a hugely successful event.
Conference on Learning Theory COLT 2022 by Antonin Schrab
By sharon.betts, on 14 October 2022
« COLT has been the prime annual meeting of the growing learning theory community for 35 years now, and that London edition has been beyond our expectations. We have been planning COLT 2022 since late 2019, and due to Covid it was unclear until a few weeks before the conference how many people would be able or willing to join. Our optimistic scenario was 150 on site attendees — we ended up at more than 270! COLT 2022 featured the higher number ever of papers (155) in a dual track format. I am especially proud that over 50% of attendees were MSc, PhD and postdocs: COLT has long been a welcoming and inclusive forum for early-career researchers. As local chair, COLT has eaten up a lot of my days and nights recently, but it certainly was worth it! » Benjamin Guedj, Inria and University College London, COLT 2022 Local Chair.
This July, I’ve had the great pleasure of participating in the Conference on Learning Theory COLT 2022 which has been held in person in London! I found the conference to be a real success, it was wonderful to finally be able to meet so many people sharing the same interests in learning theory! It was amazing to follow talks held in the historic Royal Institution of Great Britain which is the location of the famous televised Christmas Lectures!
The conference kicked off with a joint workshop between COLT and IMS (Institute of Mathematical Statistics) Annual Meeting with tutorials and talks by Emmanuel Candès, Nati Srebro and Vladimir Vovk on the topics of conformal prediction and mathematics of deep learning. This workshop allowed to bring together both audience (IMS and COLT) with aligned interests on statistics and learning theory. This was a great initiative which was really appreciated by all the participants I talked to, I hope the joint workshop between IMS and COLT will remain in future editions of the conferences!
During the four following days, all papers accepted to COLT 2022 have been presented by the authors. Each talk was ten minutes long, this format allowed to get a good overview of each of the 155 papers. Topics included Online Learning, Statistics, Privacy, Robustness, Computational Complexity, Deep Learning, Generalization, Bandits, Sampling, Optimization, Graphs, Information Theory, Reinforcement Learning and Control. It was also very interesting to listen to longer talks such as those of the two papers which received the best paper and best student paper awards of COLT 2022 (Efficient Convex Optimization Requires Superlinear Memory by Annie Marsden, Vatsal Sharan, Aaron Sidford, and Gregory Valiant, and New Projection-Free Algorithms for Online Convex Optimization with Adaptive Regret Guarantees by Ben Kretzu and Dan Garber), as well as those given by plenary speakers: Jelani Nelson from Berkeley, University of California, Maryam Fazel from University of Washington, and Alon Orlitsky from University of California San Diego.
I also really enjoyed the open problem sessions in which unsolved problems were presented in the hope that these can be solved in future editions of COLT, it was great to see which learning theory problems people currently find challenging! Other events were also organised such as the LeT-All career panel providing advice to early researchers, the Women in Machine Learning Theory luncheon discussing everyday challenges women are facing in academic and industrial Machine Learning research, the business meeting with COLT announcements about future editions of the conference, the workshop reception and the conference gala dinner which were the perfect opportunity to engage with other participants!
COLT 2022 was made possible thanks to the hard-working organizing committee: program chair Po-Ling Loh from University of Cambridge, program chair Maxim Raginsky from University of Illinois at Urbana-Champaign, local chair Benjamin Guedj from Inria and University College London, local chair Ciara Pike-Burke from Imperial College London, open problems chair Clément Canonne from University of Sydney, online experience chair Claire Vernade from DeepMind, and publication chair Suriya Gunasekar from Microsoft Research. Thank you all for making COLT 2022 possible and such a success!
I am now looking forward to COLT 2023!