Experiments in synthetic video
By Steve Rowett, on 18 October 2024
AI is a fast-moving field, and just a year after we wrote the text for the UCL Generative AI hub, some of it was looking out of date. We wanted to give it a refresh, but also make it less text-heavy. One of the team suggested we might do a series of short (90 second) video clips, similar to the popular UCL Micro-CPD series.
This was a great idea, but videos of real people always cause a problem. They are time-consuming to make, but they are even harder to update. You have to get the same people together again, and editing in a new clip can break the continuity of the original. Even something simple as wearing the same clothes can be a challenge. The web is full of listings of continuity bloopers, even for major movies.
So we looked for an alternative, and saw a growing market in synthetic video generators. One of them, Synthesia, was co-founded by UCL’s Professor Lourdes Agapito, so it seemed a natural choice for us to try. We particularly valued Synthesia’s extensive discussion around its own views on ethical AI use, in-keeping with UCL’s values.
So, we started building a trial video in Synthesia. At the start, it feels like a bit of a cut-down PowerPoint. You have a scene (similar to a slide) and you can add text and graphics to it, with animations. But then you can do something that PowerPoint doesn’t do; add one of many avatars to the scene. Then at the bottom (similar to where PowerPoint notes would be) you add the text that the avatar will speak. You then join your objects on the slide to the point in the text when they should appear. And for a simple video, that is it.
You can see how this looks in the screenshot below:
Once done, you can preview how your avatar will sound. This is useful, as the AI-generated speech will not be perfect. Where it doesn’t sound quite right, you can add an extra pause, or give specific diction that it should use to say a word or phrase. Once you are happy with the speech, press a button and your video will be generated in the background. Our 90-second videos took about 10-15 minutes to be ready to download.
You can watch the first video that we made below, and visit the UCL Generative AI hub to see the full set of eight videos.
The nice thing about this is that when you need to make a change, you just open it up, change your graphic or text, and generated the video again. It’s really simple – in fact, getting the captions right for each video was the most time-consuming part of the process. I’d estimate each video took about a day to discuss, write, make, amend, caption and deploy.
For us, this is a bit of an experiment. We know some people prefer video to text, and vice versa. But we’ve got very little experience of how people – particularly our university community – will respond to synthetic videos. If you have any thoughts on whether this format works well for you, or how it could be improved, please do let us know in the comments below.