This is not an alliance between royal houses, nor a battle against the sons of the harpy, although they are joining forces to obtain multimillion-dollar reparation for the “large-scale systematic theft” of their creations: They are 17 famous American writers, among them the author of the sagaGame of Thrones”, George R.R. Martin, those who have sued the company OpenAI for having plagiarized his works to “train” ChatGPT.
The class action lawsuit seeks compensation “for the egregious and harmful violations” allegedly made by the Artificial Intelligence (AI) company against the copyrights of Martin, and writers such as Michael Connelly creator of the fictional detective Harry Bosch, brought to the screen by Amazon Prime Video, as well as the character Mickey Haller (The Lincoln Lawyer in Netflix); John Grisham (The Pelican Brief); and Scott Turow (Presumed Innocent), among others.
The authors are asking in the Southern District Court of New York that OpenAI be sanctioned because the consortium copied its works en masse, “without permission or consideration.” Additionally, they are trying to obtain a court order forcing the firm to stop infringing on the plaintiffs’ rights, which would include prohibit the complainants’ works from being used on ChatGPT.
How was the plagiarism carried out?
The answer to this question, which is essential to winning the civil lawsuit brought by the writers, was given by ChatGPT itself when asked – in January 2023 – how it had been “trained.”
The chatbot responded: “It is possible that some of the books used to train me were copyrighted. However, my training data came from various publicly available sources on the Internet, and it is likely that some of the books included in my training data set were not authorized for use,” reads the lawsuit, a copy of which you own. HIGH LEVEL.
Additionally, the event includes a clear confession from ChatGPT, which has served the creators to argue and prove the plagiarism they complain about: “If any copyrighted material was included in my training data, they would have been used without the knowledge or consent of the owner” of those rights, the author admitted. chat.
According to the writers, who are represented in the New York Court by the law firm Cowan DeBaets Abrahams & Sheppard, OpenAI and its subsidiaries copied the works on a large scale, introducing them en masse into their language models (Large Language Model or LLM, for its acronym in English).
These models are algorithms designed to generate text, emulating human responses after receiving a query or indication (prompt). LLMs use deep learning techniques from large data sets.
“At the heart of these algorithms is systematic theft on a massive scale” of the books by George RR Martin, Michael Connelly, John Grisham, Scott Turow, and 13 other plaintiff creators, is noted in the file.
How much could the compensation be?
Instead of training its LLMs with public domain works, whose copyright is no longer in force, OpenAI would have used metadata search engines and internet pages that contain pirated copies of books.
According to investigations by independent experts in Artificial Intelligence, who are cited in the lawsuit, the files with the texts of these and other authors could have been downloaded from large repositories in which books protected by the right of copyright are housed – illicitly. author.
As an example, the Library Genesis or LibGen website is mentioned, “which offers a vast repository of pirated texts,” which was convicted in 2017 in the same Court of the Southern District of New York for intentional copyright infringement.
In that ruling, compensation of $150,000 was ordered for each of the 100 copyrights infringed, the same amount requested by the writers in their lawsuit as a sentence for OpenAI, an amount that must be multiplied by each text used – without authorization – to train ChatGPT, in case the creators win the trial.
Likewise, other possible candidates for the sources that OpenAI would have used are Z-Library, “which hosts more than 11 million books and book trackers.” torrents pirates like Bibliotik, which allows users to download e-books in bulk.”
In fact, the event claims, “the FBI confiscated Z-Library’s internet domains in February 2022, just months after OpenAI stopped “training” its GPT-3.5 in September 2021.”
The growth and sophistication of ChatGPT also suggests to the writers an increase “in the size of the “training” data sets, raising the inference that one or more of these sites must have been used” as powerful and large sources. of electronic books, regardless of whether it was piracy, Because content is found on the Internet does not mean that it is free of copyright.
“The defendants could have paid a reasonable license fee to use protected works,” but they decided to “evade the Copyright Act.” Copyright entirely to further their lucrative business endeavor,” the lawsuit concludes.
In fact, if it is found responsible for violating copyright, the viability of OpenAI as a company is at risk, since the sanction against it could be billions, since in addition to the writers’ lawsuit, other companies have been promoted this year. six lawsuits against this company.
MORE NEWS:
Surya Palacios Journalist and lawyer, specialist in legal and human rights analysis. She has been a reporter, radio host and editor.