Developers are taking advantage of the multimodal artificial intelligence to generate solutions aimed at blind people. These systems have the ability to process text and images and, from this data, generate conversational responses. The tools they use This technology can more accurately describe visual details of the environment in real time. The result: greater independence for people with this condition.
vision, for example, was launched in 2018 as a phone app that allowed you to read text on images. Since 2021 it was available for Google Glass. And this year, in May, he announced the launch of Ask Vision, a virtual visual assistant based on GPT-4, the model created by OpenAI and that drives the famous ChatGPT.
Ask Envision can recognize faces, objects, colors and even describe scenes around the user. He is able, for example, to read a menu and answer questions about prices, dietary restrictions or variety of desserts. It also includes an option for video calls to family and friends.
Richard Beardsley, one of the first to try Ask Vision, has the tool incorporated into their Google Glass. counted Wired that it is essential for him to have this “hands-free” option. Thanks to this, he can scan a text while he holds the leash of his guide dog. “Having this really makes life a lot easier,” Beardsley said.
The facial recognition of this artificial intelligence allows users to know who is in the room. It also translates texts in 60 languages and can recognize banknotes in more than 100 currencies.
Other artificial intelligence assistants for blind people
Envision is not the only option. be my eyes (Be my eyes) is another application aimed at people who are blind or have low vision that has already embraced artificial intelligence. At first, it only worked as a platform that connected volunteers with people with low vision to support them in everyday tasks: recognizing colours, checking if the lights are on or preparing dinner.
Be My Eyes recently introduced a new integration with GPT-4. Users can send images through the app to a virtual assistant. Its developers explain that a person can, for example, send a photo of the inside of their refrigerator. Artificial intelligence will respond, not only by identifying what it contains, but also by proposing recipes that can be prepared with those ingredients. It will also assist you step by step in the preparation of food.
Sina Bahram, a blind computer scientist and accessibility consultant for companies like Google and Microsoft, told Wired that two weeks ago she was walking down a New York street with a companion. At one point, the other person stopped to take a closer look at something. Bahram took advantage of Be My Eyes and that’s how he learned that his companion was looking at a collection of stickers, some cartoon and others with texts. This “is something that didn’t exist a year ago outside the lab… It just wasn’t possible,” counted.
Now Microsoft is testing the beta version of the app. “Be My Eyes has played an important role in improving the way Microsoft can provide effective technical support, which is inclusive of all of our customers and their needs,” said Neil Barnett, director of Inclusive Recruitment and Accessibility at the technology firm. The National Federation of the Blind in the United States has also partnered with this initiative.
Microsoft also released its own app this year: Seeing AI. He introduced it as a free tool “narrating the world around you.” It is available in multiple languages and offers similar visual recognition features.
Risks to be aware of
The risks associated with these tools are the same ones identified so far for artificial intelligence. Danna Gurari, Assistant Professor of Computer Science at the University of Colorado at Boulder, explained to Wired who has seen how some blind support systems can fabricate information. The same thing that has been reported for models like ChatGPT or Bard.
Gurari hosts a workshop called “Viz Wiz” at the conference Computer Vision and Pattern Recognition, which brings together artificial intelligence researchers and blind technology users. In 2018 he summoned only four teams. This year, more than 50 signed up.
«Most of what can be entrusted to them are only the high recognition items, like a car, a person or a tree,” said the expert. It is not a minor thing. “When blind people receive this information, we know from previous interviews that they prefer something to nothing.”
But the biggest problem is when you trust these tools to make more sensitive decisions. For example: what medication to take. Using these language models would also expose blind people to ethnic or gender bias detected in artificial intelligence.