“Intuitive physics” enables our pragmatic engagement with the physical world and forms a vital part of the “common sense” aspects of thinking. Current AI systems pale in understanding intuitive physics compared to very young children. Here, we address this gap between humans and machines by drawing from the field of developmental psychology. First, we introduce and make available a machine learning dataset to evaluate the conceptual understanding of intuitive physics, adopting the violation of expectation (VoE) paradigm from developmental psychology. Second, we are building a deep learning system that learns intuitive physics directly from visual data, inspired by studies of visual cognition in children. We show that our model can learn a diverse set of physical concepts that critically depend on object-level representations, consistent with findings from developmental psychology. We consider the implications of these results for both artificial intelligence and human cognition research.
Artificial intelligence (A.I.) has made fantastic progress in recent years, handling an ever-increasing range of tasks that now include Atari video games1, board games such as chess and Go2, scientific problems including protein folding3, and language modeling4. At the same time, success in these narrow areas increasingly showed that something fundamental was still missing. In particular, state-of-the-art A.I. systems still struggle to capture the “common sense” knowledge that guides predictions, inferences, and actions in everyday human scenarios5,6. In this work, we focus on one particular domain of common sense knowledge: intuitive physics, the network of concepts that underlies reasoning about the properties and interactions of macroscopic objects7. Intuitive physics underlies embodied intelligence, most obviously because it is necessary for all practical action, but also provides one basis for conceptual knowledge and compositional representation in general8. However, despite considerable effort, recent advances in A.I. have not yet yielded a system that exhibits an understanding of intuitive physics comparable to that of very young children.
To achieve a richer common-sense physical intuition in artificial intelligence systems, we draw inspiration in many points of our work from developmental psychology, where the acquisition of intuitive biological knowledge has been intensively studied9,10,11,12. We are building a deep learning system that integrates a central view of the development literature: physics is understood at the level of discrete objects and their interactions. We also draw from developmental psychology in a second way, which is related to the problem of behavioral investigation of whether an artificial intelligence system (or, in the case of developmental psychology, an infant or a child) possesses knowledge of intuitive physics.
In developing behavioral probes for research on children, developmental psychologists based their approach on two principles. First, the core of intuitive physics rests on a set of discrete concepts11,13 (for example, object constancy, object strength, continuity, etc.) that can be differentiated, operationalized, and individually explored. By explicitly focusing on discrete concepts, our work is quite different from standard approaches in A.I. for teaching intuitive physics that measure progress using video or state prediction14,15,16 metrics, binary outcome prediction17, question-answering performance18,19 or high reinforcement rewards—learning tasks20. These alternative approaches seem intuitive to require understanding some aspects of intuitive physics but do not clearly operationalize or strategically explore an explicit set of such concepts.
A second principle developmental psychologists use to investigate physical concepts is that owning a physical idea corresponds to forming a set of expectations about how the future may develop. If human viewers have a picture of object permanence,21, they will expect objects not to “disappear from existence” when they are out of sight. If they wish things not to intersect, they have a concept of solidity22. If they expect objects not to teleport from one place to another magically but to follow continuous paths through time and space, then they have a concept of continuity11. With this conceptual scaffolding emerges a method of measuring knowledge of a specific physical concept: the Violation of Expectations (VoE)21 paradigm.
Using the VoE paradigm to investigate a specific concept, researchers show infants visually similar fields (called probes) that are either consistent (physically possible) or inconsistent (physically impossible) with that physical concept. If infants are more surprised by the impossible set, this provides evidence that their expectations, derived from their knowledge of the physical concept under investigation, have been violated. In this paradigm, the surprise is purportedly measured by gaze duration, but see Refs. 23,24,25 for further discussion. For example, let us cite the concept of continuity: objects follow a continuous path through time and space. For a possible probe, the researchers26 showed an object moving horizontally behind a pillar occluded by that pillar, then emerging from occlusion and traveling toward a second pillar, where it was again occluded behind that pillar and finally emerged from occlusion. In the impossible probe, when the first pillar occludes the object, it does not exit the occlusion immediately. Instead, after a certain delay, the object emerges from behind the other pillar – it never appears in the space between the two pillars, so it appears to teleport from one pillar to the other. Experiments with infants have shown that at 2.5 months of age, they look longer at an object that teleports between two screens than at an object that moves continuously from one screen to the next26. Developmental researchers have used the same strategy to accumulate strong evidence that infants acquire a wide range of distinct physical concepts during the first year of life9,10,11,12.