Genie 3 is a general-purpose world model. It uses simple text descriptions to generate photorealistic environments that can be explored in real-time.

Towards world simulation

World models use their deep understanding of physical environments to simulate them. Genie 3 represents a major leap in capabilities – allowing agents to predict how a world evolves, and how their actions affect it.

Genie 3 makes it possible to explore an unlimited range of realistic environments. This is a key stepping stone on the path to AGI – enabling AI agents capable of reasoning, problem solving, and real-world actions.

Capabilities

Genie 3 is the first real-time, interactive world model that generates photorealistic worlds from a simple text description.





Advancing real-time interactivity

To achieve real-time controllability, Genie 3 has to recall previous environments and actions.

So, if the user is revisiting a location after a minute, the model needs to refer back to information from a minute ago. For real-time interactivity, this needs to happen multiple times per second in response to user instructions.



Pioneering promptable world events

Genie 3 enables a more expressive form of text-based interaction, called "promptable world events".

Promptable world events make it possible to change the generated world – such as altering weather conditions or introducing new objects and characters.

This increases the range of scenarios agents can use to learn about handling unexpected situations.

Real-world applications

The potential uses for Genie 3 go well beyond gaming.

Genie 3’s realistic controllable realities could offer new ways for people to learn – allowing students to explore historical eras, like Ancient Rome. These simulated environments can also be used to train autonomous vehicles in realistic scenarios, in a completely safe setting.


Fueling embodied agent research

Prototyping training environments with Genie 3 and SIMA.

Genie 3 can maintain consistent worlds, making it possible to explore more complex goals, longer sequences of actions, and real-world complexities. It can also help researchers evaluate agents’ performance, and explore their weaknesses.

SIMA is an agent capable of carrying out tasks in virtual environments – we set it goals to complete within Genie 3. Genie 3 isn’t aware of the goal – but it simulates the future based on the agent's actions.


Limitations

Limited action space

Although promptable world events allow for a wide range of environmental interventions, they're not necessarily performed by the agent itself. For now, there's a limited range of actions agents can carry out.

Interaction and simulation of other agents

Accurately modeling interactions between multiple independent agents in shared environments is an ongoing research challenge.

Accurate representation of real-world locations

Genie 3 is currently unable to simulate real-world locations with perfect accuracy.

Text rendering

Clear and legible text is often only generated when it's in the input world description.

Limited interaction duration

The model can support a few minutes of continuous interaction, rather than extended hours.


Responsibility

We believe foundational technologies, like Genie 3, require a deep commitment to responsibility from the very beginning. Technical innovations, particularly open-ended and real-time capabilities, introduce new challenges for safety and responsibility. To address these unique risks while aiming to maximize the benefits, we have worked closely with our Responsible Development & Innovation Team.

At Google DeepMind, we're dedicated to developing our best-in-class models in a way that amplifies human creativity, while limiting unintended impacts. We continue to build our understanding of risks and their appropriate mitigations as we explore the potential applications for Genie 3, to develop this technology in a responsible way.


Acknowledgements

Genie 3 was made possible due to key research and engineering contributions from Phil Ball, Jakob Bauer, Frank Belletti, Yonathan Bornfeld, Bethanie Brownfield, Kan Chen, Yutian Chen, Yoni Choukroun, Matan Cohen, Kurtis David, Ariel Ephrat, Shlomi Fruchter, Liangke Gui, Agrim Gupta, Shan Han, Kristian Holsheimer, Aleks Holynski, Jiri Hron, Christos Kaplanis, Siavash Khodadadeh, Congtao Kuang, José Lezama, Marjorie Limont, Matt McGill, Barak Meiri, Kangfu Mei, Mark Murphy, Yanko Oliveira, Roni Paiss, Jack Parker-Holder, Frank Perbet, Ben Poole, Hang Qi, Diego Rivas, Guy Scully, Jeremy Shar, Asaf Shul, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Hongjie Wang, Rundi Wu, Joyce (Jingjing) Xie, Minkai Xu, Keting Yang, Jessica Yung, Shiran Zada, Yuan Zhong.

Street View grounding in Genie was made possible due to key research and engineering contributions from Ben Poole, Jonathan Herbert, Mira Leung, Linyi Jin, Michelle Zhu and Xiangzhou Kong, as well as the Google Maps leadership team.

We thank Andrew Audibert, Cip Baetu, Jordi Berbel, David Bridson, Jake Bruce, Gavin Buttimore, Sarah Chakera, Bilva Chandra, Donghyun Cho, Paul Collins, Alex Cullum, Bogdan Damoc, Vibha Dasagi, Maxime Gazeau, Charles Gbadamosi, Woohyun Han, Dave Hawkey, Ed Hirst, Tingbo Hou, Ashyana Kachra, Lucie Kerley, Kristian Kjems, Eva Knoepfel, Vika Koriakin, Jessica Lo, Cong Lu, Zeb Mehring, Alexandre Moufarek, Henna Nandwani, Valeria Oliveira, Joseph Ortiz, Fabio Pardo, Jane Park, Andrew Pierson, Helen Ran, Nilesh Ray, Tim Salimans, Manuel Sanchez, Igor Saprykin, Amy Shen, Ashish Shenoy, Sailesh Sidhwani, Duncan Smith, Michael Chang, Joe Stanton, Hamish Tomlinson, Dimple Vijaykumar, Luyu Wang, Miaosen Wang, Qifei Wang, Will Whitney, Nat Wong, Keyang Xu, Nick Young, Vadim Zubov, Nicole Segaran, Pavan Kumar, Annie Zhou, Tiffany Hu, Ethelia Lung, Ezra Gorman, Randeep Katari, Chelsea Handler, Ian Wilkinson, Hector Hinestroza, Andrey Ryabtsev, Tyler Holland, Shivani Ghanta, Melissa Byun, Emil Bergner, Rod Strougo, Elias Roman, Carlos Hernandez, Steve Seitz.

Thanks to Tim Rocktäschel, Satinder Singh, Adrian Bolton, Inbar Mosseri, Luis C. Cobo, Aäron van den Oord, Douglas Eck, Dumitru Erhan, Raia Hadsell, Zoubin Gharamani, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.

Finally, we extend our gratitude to Mohammad Babaeizadeh, Gabe Barth-Maron, Parker Beak, Jenny Brennan, Tim Brooks, Max Cant, Harris Chan, Jeff Clune, Kaspar Daugaard, Dumitru Erhan, Ashley Feden, Simon Green, Nik Hemmings, Michael Huber, Jony Hudson, Dirichi Ike-Njoku, Hernan Moraldo, Bonnie Li, Yuchi Liu, Yixuan Huang, Eric Paskie, Kriti Saxena, Johnny Søraker, Josh Cowls, Simon Osindero, Georg Ostrovski, Ryan Poplin, Alex Rizkowsky, Giles Ruscoe, Ana Salazar, Guy Simmons, Jeff Stanway, Metin Toksoz-Exley, Xinchen Yan, Petko Yotov, Mingda Zhang and Martin Zlocha for their insights and support.