A program that allows automatic translation of Wixárika (Nayarit), Ayuuk (Oaxaca), Nahuatl (classic and modern), Mexican (Durango) and Yorinoqui (State of Mexico), as if it were English or French into Spanish, is developed by specialists from the Research Institute in Applied Mathematics and Systems (IIMAS) of the National Autonomous University of Mexico (UNAM).

Ivan Vladimir Meza Ruiz, from the Department of Computer Science of the IIMAS and head of the project, said that we are used to the use of translators offered by large international companies in Spanish-English or other languages.

According to the catalog of the National Institute of Indigenous Languages, in Mexico there are 68 linguistic groups which have 364 variants and until recently only the Microsoft company developed, in collaboration with the universities of Querétaro and Yucatán, software for the interpretation of Otomí and Maya. as part of its Heritage program.

“How do you help when there is a language that has very few speakers left like Ayapaneco? There are few records of it, so the technology is probably late for some and we cannot make the 68 official ones, but there are others that do have speakers and that are flourishing”, Specified Meza Ruiz.

The Artificial Intelligence specialist explained that as of 2014 he began the work thanks to a student who is related to the Wixárika community, known to most as Huichols, and had the intention of supporting it.

Little by little, volunteers joined this work, mainly those who are related to indigenous communities, study a technical career and work with Nahuatl, Mexicanero and Yoem Noki. For example, the IIMAS researcher advises his undergraduate student Cesar Cruz, in the IIMAS, to document the intelligent system for the Mazahua, or as they are called J ñatio, which the student developed in the form of a mobile application called MazahuApp, which is available through GoogleApps.

Another case is that of your master’s student Delfino Zacarías Márquez Cruz, a speaker from Ayuuk (Mixe), who works on an interpretation method, a task in which several members of his home site participated in data collection.

“The idea arose because I had wanted to design a translator for a long time, but I did not know how to land my idea, so I approached Dr. Iván who proposed that I do the neural network, but it required field work because when I started there were no resources to train the model and something called a corpus was needed – which are the texts between Spanish and the language that you want to work on. The challenge was to work them, find someone to translate and that people are willing to share”Said Zacarías Márquez.

Meza Ruiz explained that for this work neural networks are used, a computational model that mimics a process, which in this case is the translation from one language to another, so they require examples, such as data from sentences translated between the two. For this, common mathematical concepts are used, and to some extent basic, such as matrix operations and vector calculus.

The complexity arises when calibrating the models, that is, finding specific values ​​for each of the actions that the system must perform, in such a way that a phrase in one language is transformed into another, without being confused.

Fortunately we have several algorithms that work well, but since today’s so-called deep models have numerous modules and values ​​to process, specialized computer equipment is needed.

Meza Ruiz explained that those developed so far, including those from Microsoft, are deficient because these types of technologies are more successful when they have a body of data, that is, millions of examples of equivalent phrases in both languages ​​for the program to learn to recognize them.

“For native languages, the largest corpus are close to 10,000 examples, compared to millions of commercial systems. We are far from having a similar experience to the one we have when using a normal translator, because we have very little data. That’s part of our battle right now: get more data and increase our examples.”Stressed Zacarías Márquez.

To the above, it is added that the original voices of Mexico are predominantly oral, he specified, so the normalization of their writing is contemporary and in various cases it has not yet been decided how words, concepts and even complete sentences are written.

For example, he said, “the case of wixárika is made up of numerous words with morphological particles, so what for us can be a phrase for them is a single word,“A difficult situation to process for neural networks.

Meza Ruiz added that some losses in translation should also be considered, because for Huichol a sentence is structured based on how many people listen to what is said and if there is someone with a higher hierarchy than us, something that is not usually used in Spanish. do and this influences that some texts are incomplete.

For example, the phrase m’k’pa: pa ya p’-ta-ti-u-ti-wawi-ri-wa among other things indicates that the event described is seen by the speaker, a situation that is not marked in Spanish and the closest translation would be: she always asks us for tortillas.

To consult these works, Zacarías Márquez commented that in the case of the wixárika there is an internet site http://turing.iimas.unam.mx/wix/, and another for the ayuuk is in process.

The researcher emphasized that support is needed for the development of this type of technology in order to rescue the languages ​​of indigenous peoples, since they are traditionally studied using linguistics or anthropology to document them. In addition, the discussion arises of how much these communities need the tools, if it is beneficial for them or how they would use it, since they have other priorities.

“What we have detected is that there is a recognition by the inhabitants of Mexico that we must support their preservation, promote their use, and having an automatic translator could help this and facilitate this situation.”Stressed Zacarías Márquez.

DZ