To err is human and apparently it is also typical of Generative Artificial Intelligence systems such as ChatGPT of the company OpenAI. At the beginning of this year, the chatbot impressed everyone with its apparent ability to solve problems and speak eloquently on a wide variety of topics of all kinds.
But those who have interacted in depth with the AIhaving extensive knowledge of the topic consulted, will have reached the obvious conclusion that ChatGPT would often be a very convincing “liar”.
In the vast majority of times the platform can offer apparently robust answers and texts, although deep down they are plagued by inaccuracies, inaccurate information or outright lies.
This is a scenario that seems to be felt more and more, especially when addressing engineering or exact science topics, and it is not about our imagination.
Since in reality Artificial Intelligence would be less and less accurate in its responses. Or at least that’s what an interesting research project that completely exposes AI has discovered.
ChatGPT is increasingly imprecise and that should worry those who use it blindly
artificial intelligence ChatGPTdeveloped by OpenAI, has worsened in its ability to solve mathematical problems, according to a study of the Stanford University, where the two versions of the OpenAI chatbot were analyzed: GPT-3.5 and GPT-4.
The results showed that ChatGPT’s accuracy on certain mathematical tasks has decreased significantly compared to its previous version. Similarly, similar fluctuations occurred in more elaborate tasks such as writing code and putting together texts based on visual reasoning.
James Zoua Stanford computer science professor who participated in the study, was surprised by the significant changes in ChatGPT performance:
“When we tune a large language model to improve its performance on certain tasks, that can actually have a lot of unintended consequences, which could actually hurt the performance of this model on other tasks.” […].
“There are all kinds of interesting interdependencies in how the model responds to things that can lead to some of the worsening behaviors we see.”
The results of the research itself are a clear example of this. Since thanks to them it has been proven that ChatGPT’s capabilities were not consistent and therefore the chatbot would be progressively less reliable.
For example, when it came to solving math problems, GPT-4 hit the ground running in March 2023, correctly identifying prime numbers on 97.6% of the timebut just three months later, in June 2023, its accuracy dropped to just 2.4%.
For its part, GPT-3.5 showed an improvement, going from 7.4% of precision to 86.8% in the same task. Which can be alarming considering that this version is the one that is theoretically leaving to make way for his replacement.
The study also showed that ChatGPT’s responses to questions about gender or ethnic issues became increasingly evasive or even refused to answer in some cases, kicking users out of the chat.
The moral is then obvious: you should not blindly trust the capabilities and responses of this platform.