Understanding AI World Models and Their Significance

author
By Tanu Chahal

28/10/2024

cover image for the blog

AI "world models," or "world simulators," are generating excitement as they represent a major leap in artificial intelligence. Leading AI institutions and pioneers, including World Labs founded by Fei-Fei Li, have recently invested in developing large-scale world models. Similarly, DeepMind has recruited key developers in AI to advance its world simulation capabilities.

But what exactly are these world models?

World models mimic the natural process by which humans develop mental representations of their surroundings. Humans unconsciously interpret sensory information and form models that guide their perception and behavior. For example, a baseball player can instinctively predict the path of a pitch, allowing for a quick response. This ability, observed by researchers David Ha and Jürgen Schmidhuber, operates without conscious thought, relying instead on the brain’s internal models to assess and react to the environment.

These AI models aim to replicate this human-like reasoning and predictive power, which some experts believe is necessary for AI to reach human-level intelligence.

While the concept has existed for decades, world models are gaining attention due to their transformative potential, particularly in fields like generative video. AI-generated videos often exhibit unnatural movements, such as distorted limbs or implausible physics. Traditional generative models can predict that a basketball will bounce, but they lack an understanding of why. A world model, however, could internalize the underlying physics, resulting in more realistic simulations.

To develop this capability, world models are trained on various media—images, videos, audio, and text—enabling them to form representations of the physical world and anticipate the outcomes of actions.

For example, the video generation startup Runway uses world models to create smoother, more realistic animations. According to Runway, such models reduce the need for creators to specify every detail, as the AI understands how different objects should move and interact based on learned physics principles.

World models have far-reaching applications beyond video generation. AI researchers, including Yann LeCun of Meta, suggest that world models could eventually support advanced forecasting and planning in digital and physical settings. For instance, a model could be tasked with cleaning a room and devise a sequence of actions (e.g., vacuuming, trash removal) without relying on observed patterns but instead understanding the process needed to achieve the goal.

LeCun believes that achieving true intelligence requires AI to grasp the nature of its environment, possess common sense, and engage in reasoning and planning. However, he estimates that AI capable of these tasks may still be a decade away.

Today, simpler world models are already demonstrating abilities in basic simulation tasks. OpenAI's Sora model, for instance, can render video game environments and simulate actions within these worlds.

Creating functional world models presents significant technical challenges. Developing and running these models requires extensive computing power, which far exceeds that of current generative models. For example, training a model like Sora demands thousands of GPUs, making its use in broader applications complex and resource-intensive.

Moreover, world models, like other AI systems, are prone to biases and errors, influenced by the limitations of their training data. If trained predominantly on sunny weather videos from Europe, a world model might struggle to accurately depict snow in a different region, like Korea, due to a lack of relevant training data.

This lack of diversity in training data can lead to limited generative capabilities, particularly when portraying diverse human and cultural scenarios. Cristóbal Valenzuela, CEO of the AI startup Runway, highlights that data and engineering constraints currently prevent these models from fully capturing realistic interactions between people and their environment.

If technical hurdles are overcome, world models could serve as a bridge between AI and real-world applications, advancing virtual world generation, robotics, and AI decision-making. In robotics, for example, world models could enable machines to develop an awareness of their surroundings, enhancing their functionality.

With a more sophisticated world model, an AI system could navigate complex scenarios and reason through potential solutions. This leap could mark a significant advancement in AI, enabling applications that integrate deeply with human environments.