Ruth Card answered the phone: it was the desperate voice of her grandson, Brandon. She told him that he was in jail, without a cell phone and without a wallet. Please help him. Ruth—73 years old—did not hesitate. Together with her husband, she ran to two branches of her bank to Withdraw the money needed to pay the bail. He withdrew 3,000 Canadian dollars—about 2,000 euros—at the first branch. In the second, however, the manager stopped her: he told her that she had already served another customer with a similar call and explained that the person he had heard was probably not his grandson.
Ruth was the victim of a new phone scam methodology that uses artificial intelligence voice cloning. The logic remains the same: a scammer pretends to be someone they trust—a family member or friend—and convinces the victim to send them money to deal with an emergency. What happens is that the trap now achieves an impressive level of veracity thanks to the wide variety of tools available on the internet. —many of them at very low costs— that make it possible to replicate the voice of any person.
This case, described by The Washington Post, it’s just one of many. The American media assures that it is a growing trend in this country. Already in 2022, phishing scams were the second most common fraud in the United States: they were close to 36,000 cases of scammed victims, according to data from the Federal Trade Commission. Of the total, 5,100 were phone scams worth around $11 million. It is not known how many were made using AI.
How does AI voice cloning work?
You can generate an entire speech with a single voice audio sample that includes a few sentences. Many criminals are getting these deaths from videos posted on YouTube, podcasts, Instagram, TikTok and other networks.
The artificial intelligence software that allows cloning analyzes aspects that make a voice unique: accent, age or even gender. Then they search large databases of voices until finding similar voices and replicating patterns.
“It’s terrifying… It’s kind of a perfect storm with all the ingredients you need to create chaos,” Hany Farid, a professor of digital forensics at the University of California, told The Washington Post. According to the expert, 30-second audio would be enough to achieve convincing AI voice cloning.
Fake voices are capable of hacking bank security
It’s not just family and frightened friends who are at risk. Banks in the United States and Europe have touted voice identification as a secure way to log into your account. But a journalist from Vice he managed to circumvent his bank’s security system thanks to a voice-cloning AI system. He didn’t even have to pay; use ElevenLabsa free tool.
Elevenlabs was launched at the end of last January. He went out into the ring in an open test and without major controls. And the inevitable happened: a few days later it was possible to hear the voices of celebrities and political figures saying insults that, in reality, they never uttered. Even David Guetta put together a speech with the false voice of Eminem for a topic that he let run on networks.
Another launch that occurred in January was that of VALL-E, Microsoft’s voice cloning system. The company used Meta’s “LibriLight” audio library, which contains about 60,000 hours of audio in English from more than 7,000 different people. However, he decided not to open the code to the public, due to the risks that this could imply.