Elon Musk didn’t just buy Twitter to “liberate” it from culture woke and impose what he understands as freedom of expression. In fact, his plan is much more ambitious. The real impact of his acquisition would be far from the social platform, as Musk confirmed that he plans to use information from Twitter, a gigantic database, for the benefit of artificial intelligence.
Specifically, Elon Musk wants to use Twitter data to train artificial intelligence language models. Yes, like those that have allowed the existence of tools like ChatGPT.
A Twitter user commentedquoting the mogul: “In my opinion, Elon Musk is most likely using data from Twitter + reinforcement learning with human feedback, and Tesla’s Dojo supercomputer, to train language models on BasedAI.”
As we well know, Elon Musk is quite an active user on Twitter, and he usually answers some questions and comments from the community. In this case, he was no exception. The manager limited himself to answering “obv” (“obviously”). Unfortunately, he did not delve further into his future intentions.
Undoubtedly, Twitter’s database can be a gold mine for training an artificial intelligence language model. The reason? It has millions of expressions through which humans communicate with each other.
And no less important: there are contexts that give rise to these expressions. This would make it easier for a chatbot, for example, to respond differently depending on the topic being discussed. It is not the same to pronounce on a funny meme than to offer serious information about a natural disaster, for example. Without a doubt, turning this information into a source of learning can have great results.
Now, it cannot be neglected Twitter is a social network where toxicity abounds. It is even a proven fact that hate speech has increased since Elon Musk took over the management and ownership. However, before that event, the social network was already known for also being home to many unpleasant people on the internet.
Based on the above, the following question arises: should we really get excited about an artificial intelligence language model trained by Twitter data? We’d better take it easy until we know how they plan to do it in BasedAI.
OpenAI, the company behind the GPT-3 model and ChatGPT, has also had to deal with the toxic language of its learning mine.
In January, a report from Time exposed that OpenAI was outsourcing employees in Kenya, precisely, to detect toxic words or phrases in learning texts. The company developed security software whose purpose is to prevent inappropriate language from reaching the database with which GPT-3 is fed. The problem, of course, is that the first analysis must be done manually, hence the need to use humans.
We suspect that Elon Musk and BasedAI will do something similar with the information from Twitter. Otherwise, it’s approaching the most toxic language model ever created.