Meta announced new artificial intelligence models capable of recognizing and reproducing speech in more than 1,100 languages. The project, Massive Multilingual Speech (MMS), is built to protect and preserve languages. “We’re open-sourcing our models and code so that others in the research community can build on our work and help preserve the world’s languages and bring the world closer together,” said Meta.
Collecting Audio Data
Meta said that religious texts, such as the Bible, have been translated into many different languages and whose translations have been widely studied for text-based language translation research to collect audio data. “These translations have publicly available audio recordings of people reading these texts in different languages. As part of the MMS project, we created a dataset of readings of the New Testament in more than 1,100 languages,” said Meta.
Massive Multilingual Speech to Preserve Languages
As per details, there are around 573 known extinct languages worldwide, some of which were major languages used by massive communities in the ancient world. Meta said that many of the world’s languages are in danger of disappearing, and the limitations of current speech recognition and generation technology will only accelerate this trend.
Massively Multilingual Speech (MMS) was made available to the public through GitHub, “We’re open-sourcing our models and code so that others in the research community can build on our work and help preserve the world’s languages and bring the world closer together,” said Meta. In the future, Meta wants to increase MMS’s coverage to support even more languages and also tackle the challenge of handling dialects, which is often difficult for existing speech technology.