This week we will be talking about some exciting breakthroughs in video generation and understanding. The main splash was with OpenAI’s announcement of Sora last week, but there have also been similar releases and updates from Google, Apple, and UC Berkeley. There’s a description of each below if you want to read ahead. It will be impossible to cover all of these papers in an hour, so let me know if you want to do a deeper dive into these at a later meetup.
February 15 – OpenAI Sora – “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.” There is no paper linked to the release, but the post on OpenAI that covers the technical details is really good.
February 13 – UC Berkely – Large World Model – this is a multi-model (text and video) model capable of processing 1M tokens. It can generate images and videos as well as describe the content of videos or answer questions about what happened in a video.
February 8 – Apple KeyFramer – “A design tool for animating static images (SVGs) with natural language”. This isn’t necessarily a video tool, but generates CSS code to animate an SVG.
January 23 – Google Lumiere – Text to Video, Image to Video, and Stylized Generation.
October 31, 2023 – SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction – I’m not sure how to attribute this to a particular organization, since it appears that several Chinese universities contributed. While not quite as impressive as the other releases above, it shows a broader move to video generation.
Details: