
New research from the University of the Witwatersrand offers a fresh explanation for one of the most-discussed questions in AI: why large language models develop structured, increasingly capable behaviour as they grow in scale.
The answer, the Wits team argues, lies in a process that also governs how human children acquire language – and how language itself evolves over generations to become easier to learn.
The work, published in the journal Proceedings of the National Academy of Sciences, combines two ideas that have existed separately for years. The first is “iterated learning”, a concept from linguistics that holds that language becomes more structured as it is passed from one generation to the next, because each generation introduces small, non-random errors that favour the parts of language easiest to learn. The second is the use of deep neural networks as models of how the brain processes information.
“We built a computer brain with similar characteristics to a child’s and compared it to behaviours we see in children’s brains. We then fed it data with similar properties found in human language and watched how the generations of the computer brain learn,” said lead author Devon Jarvis, a lecturer in the school of computer science and applied mathematics at Wits and a fellow of the university’s Machine Intelligence and Neural Discovery (Mind) Institute.
“It turns out, computer brains find the structure in the data in the same way that children favour certain properties of language in learning. It also showed that the dataset becomes more structured over generations because it makes learning easier.”
Depth
Jarvis uses the example of a child learning that birds have wings and can fly, then being confused by a penguin, which cannot. The child over-generalises, makes a mistake, and in correcting it builds a more precise understanding of the world. Language passed between generations behaves similarly: the easy-to-learn portions are remembered and reused, while the more unstructured parts are gradually forgotten.
The finding most relevant to AI is that this only works when the network is deep enough. The researchers found that iterated learning produced structured, compositional behaviour only in networks with sufficient depth, multiple layers of processing and a sufficiently complex language. Shallow networks failed to capture the regularities that make language learnable – a result that echoes how today’s generative AI models rely heavily on scale for their emergent capabilities.
Read: MTN to turn its African towers into an AI inference grid
An important caveat is that the work was done using deep linear networks – deliberately simplified mathematical models – rather than the large language models that power tools such as ChatGPT. The value of the result is theoretical: it points to a mechanism that may underpin why scale matters, demonstrated in a system simple enough to analyse rather than observed directly in a frontier model.

“The fact that this was shown in a very simple version of the technology underpinning the modern boom in AI tools is also encouraging and suggests that in the intersection of multiple fields lies the fundamental principles of cognition,” Jarvis said.
The paper’s co-authors are Richard Klein, head of the school of computer science and applied mathematics at Wits; Benjamin Rosman, director of the Wits Mind Institute; and Andrew Saxe of the Gatsby Computational Neuroscience Unit and Sainsbury Wellcome Centre at University College London. — © 2026 NewsCentral Media
