Mathematics Professor Studies the Origins of Language through Numbers

Language is a gift that gives voice to our innermost hopes and dreams. How we acquire those words, however, remains a bit of a mystery. And the unlikeliest of champions, a math professor, is using equations to unlock the biological origins that turn gibberish into prose.

By Garrett Mitchener

When I introduce myself as a mathematician, I usually follow up by saying that I specialize in dynamical systems and probability, with applications to biology and linguistics. After an awkward pause and a glassy-eyed stare comes the inevitable question: “How do those subjects go together?”

The biology part is normally no surprise, since the study of life involves taking measurements and managing data. The linguistics part, however, stirs up a lot of conversation. After all, language is human, ubiquitous, natural and familiar. Math is formal and abstract. So how do they go together?

Linguistics is inherently an interdisciplinary field. It involves the study of literature, culture, social forces and cognition. The anatomy of the vocal tract, ears and brain are all important, too. So are social networks (physical and digital) as well as slang.

In some ways, language is very precise. For example, native speakers of English would agree that this sentence is a decidedly wrong way to ask who went to the store with Chris:

Who did Chris and go to the store?

In other ways, language is kind of fuzzy. Think about how different English is when spoken with a Charleston accent compared to the English you might hear in Mumbai. Ancient English texts look like they’re written in a foreign language. My friends and I will even turn on subtitles to better understand some TV shows from the BBC. Yet, it’s all English somehow.

Mathematics can help us delve deeper into the science of language (its structure, its form, its function), which in turn can tell us more about how we communicate and how we share our perspectives.

To those who don’t use mathematics on a daily basis, it may not be immediately clear what the study of numbers, quantity and space has to do with linguistics. However, in many ways, mathematics is a natural tool for understanding both the precise and fuzzy aspects of human language. Formal languages are the vehicles that mathematically represent the form and meaning of highly structured data (think computer programming languages). And these technical languages have long been used as models for the more precise aspects of natural speech. Time-frequency analysis of sound data is an essential mathematical tool for helping computers translate natural speech into commands and data they can understand. Ever heard of Watson, Siri, Google Assistant and Cortana?

Probability theory is a great tool for modeling random variation within language, such as whether you say it isn’t or it’s not. They mean the same thing, but you have to mentally flip a coin to decide which to say. One of my projects, for example, looks at whether multiple versions of a language can coexist within a population. This includes the possibility that individual speakers may randomly use more than one variant in their speech – and even change the rates at which they use each one over the course of their lives.

Mathematics can help us delve deeper into the science of language (its structure, its form, its function), which in turn can tell us more about how we communicate and how we share our perspectives. For example, until recently, it was widely accepted that children only use positive information when they begin acquiring their native language. That is, they hear utterances that adults find meaningful and try to speak the same way – mimicking at its finest. However, that isn’t the whole story. My linguist colleague Misha Becker at the University of North Carolina found that children sometimes do something a little unexpected as they seek to understand and use certain verbs.

Take the verb seem, for instance. As a raising verb, seem has no semantic subject, but borrows the subject of the phrase it modifies when syntactically necessary:

It seems that Charleston is a beautiful city.

Charleston seems to be a beautiful city.

On the other hand, try is a control verb, which means it has a semantic subject that is also the semantic subject of the phrase it modifies:

The mayor tries to be polite.

Becker discovered that young children will use control verbs in circumstances that don’t make sense:

The door tries to be purple.

Children assume the word try can be used like a raising verb, but once they’ve heard it used often enough, they learn not to use it in these nonsense ways. This is a very difficult challenge for kids to overcome. So, how do they do it? To answer that, we applied several advanced statistical methods to two sets of natural speech. Although children probably don’t use a learning process quite like the statistical methods we tried, we were able to determine a likely explanation for how our little ones gather and process the information that helps them master these verbs.

Although linguistics and mathematics might at first glance seem about as compatible as oil and water, they really do have a lot to offer each other. There are plenty of interesting mathematical problems hiding in human language, and plenty of mathematical tools that can shed light on the hidden stories of our words.

We discovered there are enough clues in whether the syntactic subject is animate or not for children to eventually catch on to the proper use of each type of verb. And that’s huge. Why? Because this discovery shows the acquisition of language is a more complex process than was originally thought. Children apparently need to keep track of several potential meanings for words and filter out the ones they never hear.

But the relationship between equations and words goes even deeper than that. Mathematical methods can help shed light on the biological origins of human language. Because the fossil record gives very little information on how and when the cognitive and physical capabilities for language evolved in hominins, researchers often use mathematical models and simulations to test their hypotheses of when our ability to communicate first emerged. I’ve used dynamical systems theory to understand when one form of language faculty might edge out another. Eventually, I hope to use computer simulations to understand the evolution of neural networks and learning mechanisms to unlock new insights into the origins of our intellectual capacity for language.

Although linguistics and mathematics might at first glance seem about as compatible as oil and water, they really do have a lot to offer each other. There are plenty of interesting mathematical problems hiding in human language, and plenty of mathematical tools that can shed light on the hidden stories of our words.

– Garrett Mitchener is an associate professor of mathematics.

Illustration by Adam Koon.