The Computational Bases of Natural Intelligence in the Human Brain


We are at a remarkable junction in the study of the human mind and brain.  With increasing frequency, machines are exceeding human performance on tasks once considered to be the exclusive province of human cognition (from the powerful arithmetic capabilities of traditional symbolic computers to the efficiency and sophistication of  face recognition, game playing, and natural language processing of modern deep learning neural networks).  There are also other species that excel in tasks that people (and machines) can only approximate (e.g., building a spider web).  However, no other type of agent – natural or artificial – exhibits the stunning range of capabilities that humans do (see figure) coupled with the efficiency of learning and processing exhibited by the human brain in acquiring and executing these. That is, the human brain seems to embody both the flexibility of symbolic computation and  the efficiency of artificial neural networks. 


Our lab has begun to explore the mechanisms responsible for this unique profile of flexibility and efficiency of cognitive function, focusing on two critical factors:  i) the ability to learn abstract, relational representations, and use these for generalization to flexibly adapt to  novel settings; and ii) the ability to regulate the use of such representations – that rely on serial, control-dependent processing – when they are needed; but to strategically engage in self-reconfiguration by investing the time and effort to acquire task-dedicated representations when more efficient, automatic processing is sufficiently advantageous.

Flexibility of human performance.  Performance of various kinds of “agents” (colors) on different tasks, relative to the best performing agent on each task.  Pink line shows “curve” of human performance.  Note that the area on the curve is qualitatively greater than for any other agent.

AbstractionOur work suggests that the first factor is subserved by computational primitives that serve as inductive biases to promote the acquisition of low dimensional representations that capture wide-spread forms of structure present in the natural world.  These include divisive normalization, selective attention over grid-cell-like coding, object-centric processing, and contrastive predictive learning.  We have shown how implementing these in neural network greatly improves their sample efficiency (i.e., reducing the number of training examples needed) for learning regularites, and the ability to use these for systematic generalization (e.g., out-of-range extrapolation) across a wide variety of cognitive tasks, including visual analogical reasoning and function learning.  However, while these mechanisms can explain how regularites can be learned in specific domains (e.g., from visual or auditory information), humans have a unique ability to generalize across domains (e.g., to abstractly apply the idea of parabola to the trajectory of an object or the frequency of a sound).  Recently, we have begun to explore the idea that this may be explained by a particular set of structural and functional characteristics of brain organization: the (partial) isolation of areas responsible for abstract processing in prefrontal cortex from the most domain specific areas of posterior neocortex, and their coupling through the use of episodic memory mechanisms in medial temporal cortex (including hippocampus).  We have implemented these features in an architecture we refer to as the Emergent Symbols through Binding Network (ESBN), and  shown that this implements a form of “relational bottleneck,” that induces abstract processing areas to learn functions based strictly on the relational structure among domain specific representations, and uses episodic memory to implement a form of variable binding between these (i.e., by coupling domain-specific values with inputs to  the learned abstract functions) that allows the system to discover a form of  genuinely symbolic computation, emulating the learning efficiency and flexibility of humans in several abstract reasoning tasks.

The ESBN Architecture.  Processing mechanisms in anterior (prefrontal) areas are (partially) isolated from domain-specific processing mechanisms in posterior areas, but coupled via episodic memory binding mechanisms in medial temporal structures, that induce a “relational bottleneck,” and can be used for variable binding.

Self-reconfiguration.  While flexibility is an undeniable virtue, it is closely associated with reliance on control, and subject to a seriality constraint that makes it inefficient.  For example, while the ability to mentally solve an arbitrary arithmetic problem is a classic example of human cognitive flexibility, most people cannot do so while carrying on a conversation.  This is in stark contrast to other kinds of tasks, often referred to as “automatic,” that can be carried out simultaneously, such as walking and talking.  While it has widely been assumed that the seriality constraints associated with control-dependent processes reflect reliance on a central, capacity-limited control mechanism (akin to the CPU of a traditional computer), work in our laboratory has suggested a radical alternative:  that a fundamental factor that determines whether a set of processes must be performed serially (i.e. are control-dependent) or they can be performed in parallel (i.e., they are automatic) has to do with whether they rely on  shared (general purpose) or separated (task-dedicated) representations.  Tasks that share representations must be serialized in order to avoid conflicting simultaneous demands from those representations, and therefore require control.  That is, serialization is the purpose, not the fault of control.  This can be avoided by taking the time to learn new, task-dedicated representations that permit parallel execution, and thus more efficient performance. However, that comes at the cost of more training and poorer generalization.  


Work in our lab has shown that the analysis of this tradeoff — between the representational flexibility of shared representations and the processing efficiency of separated representations — provides a normative, formally rigorous account of the distinction between control-dependent and automatic processing in human performance [6,19], that also suggests a fundamental tradeoff in network architectures between flexibility and efficiency, similar to the one between interpreted and compiled forms of processing in traditional symbolic architectures:  more abstract representations are valuable precisely because they are shared — they can flexibly be used by a broad range of processes and task; but, as consequence, they require serialization and therefore regulation by control. The learning of abstract functions in the ESBN architecture provides an extreme example of this, and may explain why, in humans, the most abstract forms of processing (such mathematical reasoning) also appear to be the most control-dependent, requiring serial processing, and consistently engage frontal lobe function. However, as shown in classic cognitive studies, when the exact same task is performed repeated, it can become automatized [30], presumably through the formation of dedicated representations, allowing it be performed efficiently in parallel with others.  In recent work, we have begun to consider how a system can strategically decide, when acquiring a new task, whether to rely on shared representations for more rapid acquisition but serial execution, or to expend the additional training effort to acquire task-dedicated representations for more efficient execution. To date, this work has been restricted to a formal analysis of an abstract form of the problem [28] and an implementation in a deep learning network trained on a set of simple sensorimotor tasks [26].  In current work, we are integrating these strategic decision-making mechanisms into the ESBN architecture, allowing the system to make strategic decisions between the flexible use of abstract, symbolic forms of computation versus committing to reconfiguring itself through the acquisition of task-dedicated function approximation.