Genie 3

A new frontier for world models

Try in Project Genie

View prompt guide

Project Genie

Create and explore infinitely diverse worlds.

Try Project Genie

Modelling the physical world

Experience the natural world from desert to sea – or witness extreme weather up close.

Try Project Genie

Simulating nature

Generate vibrant ecosystems, from animal behaviors to intricate plant life.

Try Project Genie

Producing animation and fiction

Conjure imaginary worlds, fantastical scenarios and expressive animated characters.

Try Project Genie

Explore the latest

Genie 3 is a general-purpose world model. It uses simple text descriptions to generate photorealistic environments that can be explored in real-time.

Towards world simulation

World models use their deep understanding of physical environments to simulate them. Genie 3 represents a major leap in capabilities – allowing agents to predict how a world evolves, and how their actions affect it.

Genie 3 makes it possible to explore an unlimited range of realistic environments. This is a key stepping stone on the path to AGI – enabling AI agents capable of reasoning, problem solving, and real-world actions.

Create your own worlds

Project Genie is an experimental research prototype that lets you create and explore infinitely diverse worlds.

Capabilities

Genie 3 is the first real-time, interactive world model that generates photorealistic worlds from a simple text description.

Real-time

Allows for fluid, real-time interaction within the generated world, operating at 20-24 frames per second.

Interactive and controllable

Generates interactive worlds from text, transforming envisioned landscapes into controllable realities ready to be explored.

Photorealistic quality

Renders rich, photorealistic worlds at 720p resolution. This high-fidelity output provides crucial visual detail for training agents on real-world complexities.

World consistency and stability

Previously seen details are recalled when revisited – and environments can handle sustained interaction without degrading.

Grounded in Street View

Genie is grounded in Street View data from Google Maps, so you can create new, unexpected worlds anchored in reality.

Real-time

Allows for fluid, real-time interaction within the generated world, operating at 20-24 frames per second.

Interactive and controllable

Generates interactive worlds from text, transforming envisioned landscapes into controllable realities ready to be explored.

Photorealistic quality

Renders rich, photorealistic worlds at 720p resolution. This high-fidelity output provides crucial visual detail for training agents on real-world complexities.

World consistency and stability

Previously seen details are recalled when revisited – and environments can handle sustained interaction without degrading.

Grounded in Street View

Genie is grounded in Street View data from Google Maps, so you can create new, unexpected worlds anchored in reality.

Modelling the physical world

Experience the natural world from desert to sea – or witness extreme weather up close.

Environment prompt: An endless ocean of immense, thundering waves features colossal turquoise barrels breaking under dramatic sun flares, swarming with hundreds of seagulls that fly close enough to momentarily obscure the view.
Character prompt: The nose of a white surfboard that slices through the water, pointed directly into the hollow barrel of a massive, breaking wave.

Environment prompt: A high-altitude open world featuring deformable snow terrain.
Character prompt: An agile alpinist with omni-directional movement and jump mechanics.

Environment prompt: A tranquil waterfall cliff area featuring dynamic water physics and interactive surface wakes.
Character prompt: A high-speed paper airplane with responsive jet-like controls and aerodynamic banking.

Environment prompt: The vibrant amazon rainforest on a beautiful and clear sunny day, high among the canopies, the light of the sun dappling through the leaves, causing rays of sunshine on the ground the river nearby. The terrain is distinctly amazonian, beautiful and lush. Hidden in the canopies is a clearing revealing a large stone temple ruin, covered in vines, foliage, dilapidated and structurally unsound.
Character prompt: A beautiful amazonian parrot, blue colored, with a black beak, majestic.

Simulating nature

Generate vibrant ecosystems, from animal behaviors to intricate plant life.

Environment prompt: A photorealistic alpine meadow with wildflowers. Among the evergreen pine trees is a rustic log cabin with a front porch. A split-rail fence meanders near the cabin. In the background there are three jagged mountain peaks covered in snow.
Character prompt: A shiba inu centered in the frame, angled like a 3rd person video game, with highly responsive controls.

Environment prompt: A natural, real-world landscape with a grassy field, flowing stream, and trees in the distance. It's nighttime with some ambient moonlight. There are dozens of adorable Red foxes running in different directions between the grass and across the stream. They look around curiously as if searching for something. Leafless trees line the far bank of the stream. The sky is a deep blue-black, with a few faint stars and occasional flying comets.
Character prompt: A person gripping a high-intensity flashlight with only their right-hand visible, spotting foxes in the dark by aiming the beam of light in any direction. The flashlight projects a broad, powerful beam washing the environment in bright light to provide immediate, high-contrast visibility as you move.

Environment prompt: A sepia-filtered mountain valley featuring high-contrast terrain and cinematic film artifacts.
Character prompt: A large soaring bald eagle with aerodynamic and realistic gliding physics.

Producing animation and fiction

Conjure imaginary worlds, fantastical scenarios and expressive animated characters.

Environment prompt: A backyard race track.
Character prompt: A blue toy car.

Environment prompt: Wide angled, a vast, ornate one-story well-lit library interior fills the scene. Tall, arched windows and dark wooden bookshelves packed with books line the walls. A grand, double curved staircase with stone balustrades sweeps up to the second-floor balcony. The wooden floor is scattered with wooden tables in the foreground, holding globes, scrolls, stacks of books, and lit candelabras. There are multiple passageways to other rooms.The atmosphere is warm and historic, rendered in a style resembling stop-motion animation or claymation with rich colors and a slightly soft texture. The walls and desks are solid and any character will bump into them.
Character prompt: A small, stylized, black claymation-style plasticine cat explores and walks around on the floor. It has large, prominent green eyes.

Environment prompt: A 3D landscape rendered with the aesthetic of a watercolor painting, characterized by soft edges and fluid color bleeds. Despite the painted style, the world has deep, traversable geometry. The terrain consists of a wet, grey asphalt road that recedes into the distance in true perspective. Puddles on the ground plane reflect the surroundings in blurred washes of color. On the left, a rustic wooden bus shelter provides a physical, volumetric structure with a dry platform. In the background, a yellow school bus drives away toward rolling green hills under a gloomy grey sky with visible rain strokes.
Character prompt: The character is a large, light pink balloon animal rabbit without a tail and with no facial features. The character runs like a human, physically stepping onto the wet road surface rather than sliding over it. The character is capable of moving deeply into the background and laterally across the street. As the character moves further away, it scales down correctly with perspective. Its feet make distinct contact with the ground, creating splashes in the puddles and casting a shadow that tracks its position on the pavement to ground it in the 3D space.

Environment prompt: A fantastical urban environment, made entirely of ice. Buildings, cars, and trees are sculpted from translucent ice, illuminated by a vibrant spectrum of colored lights. The ground is a slick, frozen surface, reflecting the dazzling colors. An archway made of ice stands in the distance. The city extends far into the distance. Beyond the ice buildings, more icy structures are visible.
Character prompt: The character is riding a snowmobile, covered in frost. The character is controlling the snowmobile, capable of driving forward and backward. Driving over the icy surface creates a thin spray of ice particles.

Environment prompt: A photorealistic pristine white landscape with rolling hills and reflective pools of water. The areas around the pools of water have a white, frosted snow-like texture. The dunes in the background have subtle shadows defining their form against the bright white sky.
Character prompt: A blue rolling ball that leaves a permanent, continuous trail of wet blue paint. The ball is capable of interacting with the water puddles, skimming the surface and creating tiny sprays of water behind it as it rolls through them.

Environment prompt: A tactile needle-felted diorama featuring wool terrain and animated fabric crowds.
Character prompt: A fuzzy snail with responsive sliding mechanics.

Environment prompt: This is a macro-scale makerspace workbench. The ground is a vast, polished light-brown wood table with realistic grain and friction. The surface is scattered with passive physics objects: a white cardboard car, a soft-serve ice cream cone, a cubic puzzle, and alphabet blocks. These objects are distinct from the player character. The background is a soft-focus workshop with a pegboard of tools and a sunlit window. The lighting creates a sharp shadow beneath the central box to visually ground it. Omnidirectional traversal is possible across the wooden plain.
Character prompt: The user controls a featureless, rectangular cardboard box with two fat squat legs and no arms or face. The character is capable of a heavy, grounded walk and vertical jumping. The camera follows the box's movement closely, keeping the character centered. The character moves with a stop-motion aesthetic, pressing into the wood surface without sliding. The action command triggers a 'Head-Butt,' where the rectangular torso lunges forward to shove objects with kinetic force. The character is a solid physical object.

Environment prompt: A macro claymation garden featuring soft, deformable terrain.
Character prompt: A modeled clay ladybug with stop-motion walk physics.

Exploring locations

Transcend the limits of time and space to explore past eras and distant lands.

Environment prompt: A vast snowy mountain range featuring a suspended aerial ring course.
Character prompt: A high-velocity wingsuit flyer with responsive aerodynamic physics.

Environment prompt: A rugged alien landscape with traversable terrain and reactive dust physics.
Character prompt: A vintage roadster with high-speed off-road handling.

Environment prompt: A modern interior featuring reflective hardwood floors and beautiful light-rays.
Character prompt: A gray robot vacuum cleaner with an adult Tabby cat in a definitive sitting position on its center. The camera follows the robot vacuum as it responsively navigates the environment, it is highly controllable.

Environment prompt: A hyperrealistic video game set at an abandoned concrete monument covered in graffiti under a cloudy sky. A cracked road leads to the monument's grand staircase, flanked by dry, overgrown grass and scattered rocks.
Character prompt: A fast, rugged remote-controlled vehicle with large tires. The bright headlights of the vehicle turn on automatically if it enters a dark area and turn off if not needed.

Advancing real-time interactivity

To achieve real-time controllability, Genie 3 has to recall previous environments and actions.

So, if the user is revisiting a location after a minute, the model needs to refer back to information from a minute ago. For real-time interactivity, this needs to happen multiple times per second in response to user instructions.

Enabling environmental consistency over a long horizon

One of the main challenges of generating AI worlds is keeping them consistent over time. This is harder than generating an entire video, as inaccuracies tend to increase the longer the world is actively generated.

Genie 3 environments are far more dynamic and detailed than other methods, such as NeRFs and Gaussian Splatting. This is because they’re “auto-regressive” – created frame by frame based on the world description and user actions. The environments remain largely consistent for several minutes, with memory recalling changes from specific interactions for up to a minute.

Prompt: POV action camera of a tan house being painted by a first person agent with a paint roller

Prompt: A Victorian street with a grey house. The grey house has a portal ringed by magical sparks. The portal leads to a vast desert filled with dunes, and that desert is visible from the outside. The agent can walk into the portal and is teleported to the desert.

Prompt: This is a fantastical, whimsical forest environment. The lighting is bright and cheerful, suggesting a sunny day with dappled light filtering through a dense canopy of lush, oversized leaves. The air is clear and still. The ground is a soft, verdant carpet of moss and unusually large, brightly coloured mushrooms in shades of red and blue, their caps dotted with white. Winding dirt paths, well-trodden and narrow, weave between towering, ancient trees with smooth, grey bark. Interspersed throughout the forest are charming, mushroom-shaped houses, with intricate wooden doors and tiny, circular windows, each one unique in its design and colour palette, ranging from vibrant reds to gentle blues and greens. Various small, friendly forest creatures, such as colourful butterflies and tiny singing birds, flit amongst the foliage, adding to the lively atmosphere. There is an abundance of peculiar, oversized flowers blooming in an array of pastel and bright hues, releasing a gentle glow.

Prompt: An extremely enormous, realistic gorilla, draped in a flamboyant, emerald red vest with ornate brass buttons and an elaborate, feathered bicorne hat, brandishing only a vintage silk parasol, navigates a series of outrageously extravagant, moss-laden McMansions where grand marble structures are subtly embraced by sprawling, ancient rose bushes and creeping ivy.

Prompt: Walking around ancient Athens, Greek architecture, marble

Pioneering promptable world events

Genie 3 enables a more expressive form of text-based interaction, called "promptable world events".

Promptable world events make it possible to change the generated world – such as altering weather conditions or introducing new objects and characters.

This increases the range of scenarios agents can use to learn about handling unexpected situations.

Effective prompting with Genie

Prompting Genie 3 involves two core elements: the world you want to build, and the character you're bringing to life.

Real-world applications

The potential uses for Genie 3 go well beyond gaming.

Genie 3’s realistic controllable realities could offer new ways for people to learn – allowing students to explore historical eras, like Ancient Rome. These simulated environments can also be used to train autonomous vehicles in realistic scenarios, in a completely safe setting.

Fueling embodied agent research

Prototyping training environments with Genie 3 and SIMA.

Genie 3 can maintain consistent worlds, making it possible to explore more complex goals, longer sequences of actions, and real-world complexities. It can also help researchers evaluate agents’ performance, and explore their weaknesses.

SIMA is an agent capable of carrying out tasks in virtual environments – we set it goals to complete within Genie 3. Genie 3 isn’t aware of the goal – but it simulates the future based on the agent's actions.

Learn more

Limitations

Limited action space

Although promptable world events allow for a wide range of environmental interventions, they're not necessarily performed by the agent itself. For now, there's a limited range of actions agents can carry out.

Interaction and simulation of other agents

Accurately modeling interactions between multiple independent agents in shared environments is an ongoing research challenge.

Accurate representation of real-world locations

Genie 3 is currently unable to simulate real-world locations with perfect accuracy.

Text rendering

Clear and legible text is often only generated when it's in the input world description.

Limited interaction duration

The model can support a few minutes of continuous interaction, rather than extended hours.

Responsibility

We believe foundational technologies, like Genie 3, require a deep commitment to responsibility from the very beginning. Technical innovations, particularly open-ended and real-time capabilities, introduce new challenges for safety and responsibility. To address these unique risks while aiming to maximize the benefits, we have worked closely with our Responsible Development & Innovation Team.

At Google DeepMind, we're dedicated to developing our best-in-class models in a way that amplifies human creativity, while limiting unintended impacts. We continue to build our understanding of risks and their appropriate mitigations as we explore the potential applications for Genie 3, to develop this technology in a responsible way.

Acknowledgements

Genie 3 was made possible due to key research and engineering contributions from Phil Ball, Jakob Bauer, Frank Belletti, Yonathan Bornfeld, Bethanie Brownfield, Kan Chen, Yutian Chen, Yoni Choukroun, Matan Cohen, Kurtis David, Ariel Ephrat, Shlomi Fruchter, Liangke Gui, Agrim Gupta, Shan Han, Kristian Holsheimer, Aleks Holynski, Jiri Hron, Christos Kaplanis, Siavash Khodadadeh, Congtao Kuang, José Lezama, Marjorie Limont, Matt McGill, Barak Meiri, Kangfu Mei, Mark Murphy, Yanko Oliveira, Roni Paiss, Jack Parker-Holder, Frank Perbet, Ben Poole, Hang Qi, Diego Rivas, Guy Scully, Jeremy Shar, Asaf Shul, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Hongjie Wang, Rundi Wu, Joyce (Jingjing) Xie, Minkai Xu, Keting Yang, Jessica Yung, Shiran Zada, Yuan Zhong.

Street View grounding in Genie was made possible due to key research and engineering contributions from Ben Poole, Jonathan Herbert, Mira Leung, Linyi Jin, Michelle Zhu and Xiangzhou Kong, as well as the Google Maps leadership team.

We thank Andrew Audibert, Cip Baetu, Jordi Berbel, David Bridson, Jake Bruce, Gavin Buttimore, Sarah Chakera, Bilva Chandra, Donghyun Cho, Paul Collins, Alex Cullum, Bogdan Damoc, Vibha Dasagi, Maxime Gazeau, Charles Gbadamosi, Woohyun Han, Dave Hawkey, Ed Hirst, Tingbo Hou, Ashyana Kachra, Lucie Kerley, Kristian Kjems, Eva Knoepfel, Vika Koriakin, Jessica Lo, Cong Lu, Zeb Mehring, Alexandre Moufarek, Henna Nandwani, Valeria Oliveira, Joseph Ortiz, Fabio Pardo, Jane Park, Andrew Pierson, Helen Ran, Nilesh Ray, Tim Salimans, Manuel Sanchez, Igor Saprykin, Amy Shen, Ashish Shenoy, Sailesh Sidhwani, Duncan Smith, Michael Chang, Joe Stanton, Hamish Tomlinson, Dimple Vijaykumar, Luyu Wang, Miaosen Wang, Qifei Wang, Will Whitney, Nat Wong, Keyang Xu, Nick Young, Vadim Zubov, Nicole Segaran, Pavan Kumar, Annie Zhou, Tiffany Hu, Ethelia Lung, Ezra Gorman, Randeep Katari, Chelsea Handler, Ian Wilkinson, Hector Hinestroza, Andrey Ryabtsev, Tyler Holland, Shivani Ghanta, Melissa Byun, Emil Bergner, Rod Strougo, Elias Roman, Carlos Hernandez, Steve Seitz.

Thanks to Tim Rocktäschel, Satinder Singh, Adrian Bolton, Inbar Mosseri, Luis C. Cobo, Aäron van den Oord, Douglas Eck, Dumitru Erhan, Raia Hadsell, Zoubin Gharamani, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.

Finally, we extend our gratitude to Mohammad Babaeizadeh, Gabe Barth-Maron, Parker Beak, Jenny Brennan, Tim Brooks, Max Cant, Harris Chan, Jeff Clune, Kaspar Daugaard, Dumitru Erhan, Ashley Feden, Simon Green, Nik Hemmings, Michael Huber, Jony Hudson, Dirichi Ike-Njoku, Hernan Moraldo, Bonnie Li, Yuchi Liu, Yixuan Huang, Eric Paskie, Kriti Saxena, Johnny Søraker, Josh Cowls, Simon Osindero, Georg Ostrovski, Ryan Poplin, Alex Rizkowsky, Giles Ruscoe, Ana Salazar, Guy Simmons, Jeff Stanway, Metin Toksoz-Exley, Xinchen Yan, Petko Yotov, Mingda Zhang and Martin Zlocha for their insights and support.

Genie 3

Project Genie

Modelling the physical world

Simulating nature

Producing animation and fiction

Create your own worlds

Capabilities

Genie 3 is the first real-time, interactive world model that generates photorealistic worlds from a simple text description.

Real-time

Interactive and controllable

Photorealistic quality

World consistency and stability

Grounded in Street View

Real-time

Interactive and controllable

Photorealistic quality

World consistency and stability

Grounded in Street View

Modelling the physical world

Experience the natural world from desert to sea – or witness extreme weather up close.

Simulating nature

Generate vibrant ecosystems, from animal behaviors to intricate plant life.

Producing animation and fiction

Conjure imaginary worlds, fantastical scenarios and expressive animated characters.

Exploring locations

Transcend the limits of time and space to explore past eras and distant lands.

Enabling environmental consistency over a long horizon

Pioneering promptable world events

Genie 3 enables a more expressive form of text-based interaction, called "promptable world events".

Effective prompting with Genie

Fueling embodied agent research

Prototyping training environments with Genie 3 and SIMA.

Limitations

Limited action space

Interaction and simulation of other agents

Accurate representation of real-world locations

Text rendering

Limited interaction duration

Responsibility

Enabling environmental consistency over a long horizon