Google DeepMind has introduced its latest innovation in artificial intelligence video generation, Veo 2. This advanced model is designed to surpass OpenAI’s Sora in both resolution and duration capabilities. As the successor to Veo, the new model can generate videos of up to two minutes in 4K resolution, offering four times the resolution and over six times the duration of Sora’s current output.
Initially, Veo 2 is available exclusively through Google’s experimental VideoFX tool, where the generated videos are capped at 720p and eight seconds. In comparison, Sora offers up to 1080p resolution with clips lasting 20 seconds. DeepMind plans to roll out Veo 2 on its Vertex AI platform, enabling broader use as the model becomes ready for large-scale implementation.
Eli Collins, Vice President of Product at DeepMind, stated, “Over the coming months, we’ll continue to iterate based on feedback from users and [we’ll] look to integrate Veo 2’s updated capabilities into compelling use cases across the Google ecosystem … [W]e expect to share more updates next year.”
The Veo 2 model introduces significant advancements over its predecessor. It boasts improved physics understanding, enhanced camera controls, and sharper textures in motion-heavy scenes. With precise virtual camera positioning, Veo 2 captures objects and people from various angles, while also offering realistic modeling of motion, fluid dynamics, and light properties. These improvements include support for cinematic effects and nuanced human expressions.
Despite these advancements, challenges remain. Collins admitted that “coherence and consistency are areas for growth,” particularly in maintaining character consistency and adhering to complex prompts over long durations. He also noted room for improvement in generating intricate details and handling complex motions.
DeepMind collaborates with creatives and producers, including artists like Donald Glover and The Weeknd, to refine Veo 2’s capabilities further. According to Collins, their feedback has been instrumental in shaping both the development of Veo 2 and its application for artistic use cases.
The training process for Veo 2 relied on high-quality video-description pairings. While DeepMind did not disclose the specific sources of its training data, it emphasized the use of descriptive pairings to enhance the model’s ability to generate realistic outputs. However, concerns persist as DeepMind does not currently offer mechanisms for creators to opt out of having their content included in training datasets.
To mitigate risks such as deepfake misuse, DeepMind employs its proprietary watermarking technology, SynthID, embedding invisible markers into generated frames. However, like other watermarking technologies, SynthID is not entirely foolproof.
The launch of Veo 2 highlights Google DeepMind’s ambition to lead the AI video generation space. With its enhanced realism and functionality, Veo 2 stands as a significant leap forward, even as ethical challenges and technical limitations remain areas of active development.
TECHCRUNCH
Read More