Meta, the parent company of Facebook, Instagram, and WhatsApp, led by Mark Zuckerberg, has developed an artificial intelligence model called SEAMLESSM4T, which incorporates several innovations and surpasses existing models. This system is capable of performing translations in multiple languages, both from and to text and from and to audio, as well as all their combinations.
The SEAMLESSM4T model, developed by Meta's AI division, FAIR, is an evolution of its previous model presented in August 2023 and aims to achieve a "Babel Fish" by helping to translate speech between any two languages, reported El Faro de Vigo. This advancement brings the concept of instantaneous universal translation closer to reality.
SEAMLESSM4T facilitates voice-to-voice translation, recognizing 101 languages and translating into 36 languages; voice-to-text translation from 101 to 96 languages; text-to-voice translation from 96 to 36 languages; text-to-text translation among 96 languages; and automatic speech recognition for 96 languages. This capability accelerates the translation process by performing translations without intermediate steps.
SEAMLESSM4T achieves between 8% and 23% better results than state-of-the-art translation systems, with a precision that is 8% to 23% higher, according to the Bilingual Evaluation Understudy standard, as reported by Tech Xplore.
Furthermore, SEAMLESSM4T is 50% more resistant to background noise and speaker variations in voice-to-text conversion tasks than previous state-of-the-art systems, with improved background noise filtering by 42% to 66%, according to 20 Minutos. This robustness enhances its performance in real-world scenarios where such challenges are common.
According to the journal Nature, the SEAMLESSM4T system promises to revolutionize global communications by imitating the tone and voice of the interlocutors and represents a step forward in improving communication beyond linguistic barriers, as reported by Tech Xplore. Readers of science fiction might be familiar with the Babel Fish from Douglas Adams' The Hitchhiker's Guide to the Galaxy, a small fish that could be inserted into an ear and simultaneously translate from one spoken language to another, as noted by HuffPost Spain.
Meta has made resources related to SEAMLESSM4T publicly available for non-commercial use to assist further research on inclusive speech translation technologies, according to El Periódico. "All contributions to this work are publicly available for non-commercial use in order to promote further research on inclusive speech translation technologies," the company stated, as reported by 20 Minutos.
In an article published in Nature, Tanel Alumäe from the Language Technology Laboratory at Tallinn University of Technology (TalTech) in Estonia highlights that the model is capable of translating directly into 36 languages. Alumäe describes this capability as "impressive because it can—for example—translate spoken English to spoken German without having to transcribe it first into English to translate it afterward."
Alumäe points out that although the SEAMLESSM4T model translates around a hundred languages, the number of spoken languages in the world is about 7,000. He notes that the tool still has difficulties in many situations that humans handle with relative ease, such as conversations in noisy places or between people with strong accents, according to Diario de Sevilla. He predicts that "the authors' methods to leverage real-world data will open a promising path towards speech technology that rivals science fiction," as reported by El Periódico.
Allison Koenecke from the Department of Computer Science at Cornell University in New York warns that although speech technologies may be more efficient and cost-effective than humans, "it is imperative to understand the ways in which these technologies fail disproportionately for some demographic groups," especially in sensitive contexts like medicine or the legal field. Koenecke emphasizes that it is essential for future researchers in speech technologies to improve performance disparities, as noted by El Periódico. She states that users should be well-informed about the possible benefits and harms associated with these models.
SEAMLESSM4T includes languages with limited data available for training AI models, improving the shortcomings of other models regarding "languages with fewer speakers or less available digital data," as reported by PSN Noticias. Meta has prioritized the elimination of toxic results that may incite hate, violence, or abuse in translations, and to achieve this, it has implemented a specific tool called Etox, according to PSN Noticias.
The article was written with the assistance of a news analysis system.