World鈥檚 largest grammar database reveals accelerating loss of language diversity
There鈥檚 a crisis unfolding in the field of linguistics: Global language experts estimate that, without intervention, about one language will be lost every month for the next 40 years.
A study published today in debuts a grammatical database that documents the enormous diversity of current languages on the planet, highlighting just how much humanity stands to lose and why it's worth saving.
Known as , it is now the world鈥檚 largest publicly available comparative grammatical database. Initiated by scholars in the Department of Linguistic and Cultural Evolution at the in Leipzig, Germany, more than 100 authors from 68 institutions (including CU 麻豆影院) contributed to the years-long, global data project.
The analysis of more than 400,000 data points and 2,400 separate languages and dialects reveals that language loss is occurring unevenly across major linguistic regions of the world, with Indigenous languages in northeast South America, Alaska to Oregon, and in northern Australia at highest risk.
鈥淕rambank is showing us the importance of working on language documentation and revitalization in order to preserve this legacy of human communication, culture and cognition,鈥 said Hannah Haynie, co-first author of the study and assistant professor in the Department of Linguistics at CU 麻豆影院.
Grammar 101
Grammar is simply the rules of a language: the words and sounds used and how they are combined and interpreted. Grammatical elements of a language include word order (if the subject goes before or after the verb), tense (present, past or future), comparatives (words that express 鈥渂igger鈥 or 鈥渟maller鈥) and whether a language has gendered pronouns.
Over the past century, many researchers have studied languages, worked with their speakers and published books or other types of grammatical descriptions of languages. Grambank is built both on these research analyses and prior language databases, but compared with previous databases it is larger in scale and more thorough. It encodes 195 possible grammatical features for about 215 language families.
鈥淥ur understanding of grammar and what that tells us about humans is limited by what we can observe,鈥 said Haynie. 鈥淲e're putting those observations into this data set, and that allows for comparison.鈥
As there are currently about 4,300 languages with published grammatical descriptions鈥攐ut of about 7,000 known languages in the modern world鈥擥rambank is over halfway to encoding all possible grammar information that can be extracted from existing data sources, said Haynie.
鈥楿苍耻蝉耻补濒鈥&苍产蝉辫;濒补苍驳耻补驳别蝉&苍产蝉辫;
Using Grambank, the team found they could identify 鈥渦nusual鈥 languages: those that stray further from the averages in variation typically found in language, which often have no known sister languages. But they also found there鈥檚 nothing particularly unusual about endangered languages compared with those that are not endangered.
鈥淎 lot of fairly ordinary languages, in terms of their basic grammar, happen to be endangered for a variety of reasons,鈥 said Haynie.
English, spoken around the world by 1.5 billion people, is actually 鈥渁 pretty weird language鈥 by Grambank鈥檚 standards.
鈥淪ome of the places with more 鈥榰nusual鈥 languages are places like Europe and Northern Africa鈥攍anguages that we, as English speakers, tend to be more familiar with,鈥 said Haynie.
The bigger takeaway for Haynie is that none of the languages in the data set are identical. Of all 2,400 languages and dialects in the data set, only five match up the same using the grammatical code used to document and analyze them within Grambank. Though vocabulary may play a big role in the mutual unintelligibility that linguists rely on to determine what counts as separate languages, Grambank shows that the grammatical 鈥渇ingerprints鈥 of languages are also typically unique, she said.
鈥淚t means that every language is pretty darn special,鈥 said Haynie.
Language always finds a way.鈥
Language loss
Language extinction has occurred throughout human history, but its speed has been accelerating due to social, political and economic pressures, said Haynie.
It鈥檚 as if, while mapping the human genome, scientists saw the genes themselves rapidly disappearing before their eyes.
鈥淩ight now we're at a critical state in terms of language endangerment,鈥 said Haynie, noting the United Nations has declared this the to try to promote language preservation, documentation and revitalization.
This global language loss is also not evenly distributed. Several regions are at higher risk of losing Indigenous languages such as Aleut in Alaska and Salish languages of the Pacific Northwest, Yagua and Tariana spoken in South America, and the languages of Kuuk-Thayorre and Wardaman native to Northern Australian communities.
鈥淚ndigenous languages here in North America, languages around us and on our continent, are some of the most endangered languages in the world,鈥 she said.
Genealogy versus geography
One element that has been 鈥渉otly debated鈥 within linguistics for years is the relationship between genealogy and geography in the development of language. That is: Which features in language are inherited from family and culture (genealogy) and which are more likely to be shared through contact among neighbors (geography)?
The Grambank analysis found genealogy seems to be consistently more important than geography鈥攎eaning the faithful inheritance of ancestral language plays a stronger role in shaping grammar in languages still spoken today than who someone鈥檚 geographical neighbors were and how they talked, said Haynie.
While language crossover and bilingualism are well documented throughout history, this finding showcases how there is much we can still learn about human history and the ways we communicate in present day from the words of our ancestors.
鈥淟anguage always finds a way,鈥 said Haynie.
is an open-access comprehensive resource maintained by the Max Planck Society.