Create your own worlds
Project Genie is an experimental research prototype that lets you create and explore infinitely diverse worlds.
A new frontier for world models
Create and explore infinitely diverse worlds.
Experience the natural world from desert to sea – or witness extreme weather up close.
Generate vibrant ecosystems, from animal behaviors to intricate plant life.
Conjure imaginary worlds, fantastical scenarios and expressive animated characters.
Genie 3 is a general-purpose world model. It uses simple text descriptions to generate photorealistic environments that can be explored in real-time.
Towards world simulation
World models use their deep understanding of physical environments to simulate them. Genie 3 represents a major leap in capabilities – allowing agents to predict how a world evolves, and how their actions affect it.
Genie 3 makes it possible to explore an unlimited range of realistic environments. This is a key stepping stone on the path to AGI – enabling AI agents capable of reasoning, problem solving, and real-world actions.
Project Genie is an experimental research prototype that lets you create and explore infinitely diverse worlds.
Allows for fluid, real-time interaction within the generated world, operating at 20-24 frames per second.
Generates interactive worlds from text, transforming envisioned landscapes into controllable realities ready to be explored.
Renders rich, photorealistic worlds at 720p resolution. This high-fidelity output provides crucial visual detail for training agents on real-world complexities.
Previously seen details are recalled when revisited – and environments can handle sustained interaction without degrading.
Genie is grounded in Street View data from Google Maps, so you can create new, unexpected worlds anchored in reality.
Advancing real-time interactivity
To achieve real-time controllability, Genie 3 has to recall previous environments and actions.
So, if the user is revisiting a location after a minute, the model needs to refer back to information from a minute ago. For real-time interactivity, this needs to happen multiple times per second in response to user instructions.
One of the main challenges of generating AI worlds is keeping them consistent over time. This is harder than generating an entire video, as inaccuracies tend to increase the longer the world is actively generated.
Genie 3 environments are far more dynamic and detailed than other methods, such as NeRFs and Gaussian Splatting. This is because they’re “auto-regressive” – created frame by frame based on the world description and user actions. The environments remain largely consistent for several minutes, with memory recalling changes from specific interactions for up to a minute.
Promptable world events make it possible to change the generated world – such as altering weather conditions or introducing new objects and characters.
This increases the range of scenarios agents can use to learn about handling unexpected situations.
Prompting Genie 3 involves two core elements: the world you want to build, and the character you're bringing to life.
Real-world applications
The potential uses for Genie 3 go well beyond gaming.
Genie 3’s realistic controllable realities could offer new ways for people to learn – allowing students to explore historical eras, like Ancient Rome. These simulated environments can also be used to train autonomous vehicles in realistic scenarios, in a completely safe setting.
Genie 3 can maintain consistent worlds, making it possible to explore more complex goals, longer sequences of actions, and real-world complexities. It can also help researchers evaluate agents’ performance, and explore their weaknesses.
SIMA is an agent capable of carrying out tasks in virtual environments – we set it goals to complete within Genie 3. Genie 3 isn’t aware of the goal – but it simulates the future based on the agent's actions.
Although promptable world events allow for a wide range of environmental interventions, they're not necessarily performed by the agent itself. For now, there's a limited range of actions agents can carry out.
Accurately modeling interactions between multiple independent agents in shared environments is an ongoing research challenge.
Genie 3 is currently unable to simulate real-world locations with perfect accuracy.
Clear and legible text is often only generated when it's in the input world description.
The model can support a few minutes of continuous interaction, rather than extended hours.
We believe foundational technologies, like Genie 3, require a deep commitment to responsibility from the very beginning. Technical innovations, particularly open-ended and real-time capabilities, introduce new challenges for safety and responsibility. To address these unique risks while aiming to maximize the benefits, we have worked closely with our Responsible Development & Innovation Team.
At Google DeepMind, we're dedicated to developing our best-in-class models in a way that amplifies human creativity, while limiting unintended impacts. We continue to build our understanding of risks and their appropriate mitigations as we explore the potential applications for Genie 3, to develop this technology in a responsible way.
Acknowledgements
Genie 3 was made possible due to key research and engineering contributions from Phil Ball, Jakob Bauer, Frank Belletti, Yonathan Bornfeld, Bethanie Brownfield, Kan Chen, Yutian Chen, Yoni Choukroun, Matan Cohen, Kurtis David, Ariel Ephrat, Shlomi Fruchter, Liangke Gui, Agrim Gupta, Shan Han, Kristian Holsheimer, Aleks Holynski, Jiri Hron, Christos Kaplanis, Siavash Khodadadeh, Congtao Kuang, José Lezama, Marjorie Limont, Matt McGill, Barak Meiri, Kangfu Mei, Mark Murphy, Yanko Oliveira, Roni Paiss, Jack Parker-Holder, Frank Perbet, Ben Poole, Hang Qi, Diego Rivas, Guy Scully, Jeremy Shar, Asaf Shul, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Hongjie Wang, Rundi Wu, Joyce (Jingjing) Xie, Minkai Xu, Keting Yang, Jessica Yung, Shiran Zada, Yuan Zhong.
Street View grounding in Genie was made possible due to key research and engineering contributions from Ben Poole, Jonathan Herbert, Mira Leung, Linyi Jin, Michelle Zhu and Xiangzhou Kong, as well as the Google Maps leadership team.
We thank Andrew Audibert, Cip Baetu, Jordi Berbel, David Bridson, Jake Bruce, Gavin Buttimore, Sarah Chakera, Bilva Chandra, Donghyun Cho, Paul Collins, Alex Cullum, Bogdan Damoc, Vibha Dasagi, Maxime Gazeau, Charles Gbadamosi, Woohyun Han, Dave Hawkey, Ed Hirst, Tingbo Hou, Ashyana Kachra, Lucie Kerley, Kristian Kjems, Eva Knoepfel, Vika Koriakin, Jessica Lo, Cong Lu, Zeb Mehring, Alexandre Moufarek, Henna Nandwani, Valeria Oliveira, Joseph Ortiz, Fabio Pardo, Jane Park, Andrew Pierson, Helen Ran, Nilesh Ray, Tim Salimans, Manuel Sanchez, Igor Saprykin, Amy Shen, Ashish Shenoy, Sailesh Sidhwani, Duncan Smith, Michael Chang, Joe Stanton, Hamish Tomlinson, Dimple Vijaykumar, Luyu Wang, Miaosen Wang, Qifei Wang, Will Whitney, Nat Wong, Keyang Xu, Nick Young, Vadim Zubov, Nicole Segaran, Pavan Kumar, Annie Zhou, Tiffany Hu, Ethelia Lung, Ezra Gorman, Randeep Katari, Chelsea Handler, Ian Wilkinson, Hector Hinestroza, Andrey Ryabtsev, Tyler Holland, Shivani Ghanta, Melissa Byun, Emil Bergner, Rod Strougo, Elias Roman, Carlos Hernandez, Steve Seitz.
Thanks to Tim Rocktäschel, Satinder Singh, Adrian Bolton, Inbar Mosseri, Luis C. Cobo, Aäron van den Oord, Douglas Eck, Dumitru Erhan, Raia Hadsell, Zoubin Gharamani, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.
Finally, we extend our gratitude to Mohammad Babaeizadeh, Gabe Barth-Maron, Parker Beak, Jenny Brennan, Tim Brooks, Max Cant, Harris Chan, Jeff Clune, Kaspar Daugaard, Dumitru Erhan, Ashley Feden, Simon Green, Nik Hemmings, Michael Huber, Jony Hudson, Dirichi Ike-Njoku, Hernan Moraldo, Bonnie Li, Yuchi Liu, Yixuan Huang, Eric Paskie, Kriti Saxena, Johnny Søraker, Josh Cowls, Simon Osindero, Georg Ostrovski, Ryan Poplin, Alex Rizkowsky, Giles Ruscoe, Ana Salazar, Guy Simmons, Jeff Stanway, Metin Toksoz-Exley, Xinchen Yan, Petko Yotov, Mingda Zhang and Martin Zlocha for their insights and support.