It’s funny how the advent of artificial intelligence seems to be coming from the side we didn’t expect. The attempts for an AI to help us drive autonomous vehicles or be the brain of robots that replace us in the most repetitive tasks -with the labor implications that this has-, seem to be still a long way off. At least massively. Nevertheless, the images that these last weeks have traveled the internet with creations made by DALL-E 2 and IMAGENthe two most powerful generative AIs ever created, tell us that if anything It does seem that AI is getting closer to humans, not in mechanical tasks to give us free time, but in more creative ones.
DALL-E 2 is the second version of the generative AI created by OpenAI, a company originally founded by Elon Musk, who would later leave his management, and which has received significant funding from Microsoft. DALL-E 2 works by working with huge databases from which it is capable of extracting and recognizing references both in text and in images, formulating results that leave you speechless; in a mixture of stupefaction and fear.
This is one of the several options that it gives to the answer that DALL-E 2 gives to the sentence “teddy bears working in a laboratory with steampunk aesthetics”.
“What seems obvious is that Proposals like DALL-E 2 are going to convulse entire industries. The first that comes to mind is stock images.. If with just one sentence we can have dozens of results, some realistic, others in the form of illustration, fantasy, the image banks lose much relevance, he explains to hypertextual Javier Ideamia computer engineer who has developed his entire career with one foot technical and the other artistic, and who is currently immersed in the possibilities of generative AI with his own proposal, Geniverse.coa kind of digital canvas that also returns images based on our indications.
To this has been added IMAGE, a similar project by Google that has also been released these days, in its case much more focused on generating images with a realistic approach.
“I think that in a very few years, when these technologies are in the public domain and integrated into all the devices we use (including mobile), they are going to be an agent of change with consequences that today are very difficult to predict”, he points out Javier Lopezfounder of Erasmusu and who in recent times has been interested in and investigated the possibilities offered by these new ways of generating images.
With the two of them we are going to take a tour of how DALL-E 2 works and the challenges and opportunities it poses.
“It works in a similar way to the human brain when we evoke memories”
Ideami has access to the DALL-E beta, which has allowed it to see its full potential. He did not get it directly as a result of his activity with Geniversebut for something much more mundane. “In Miami, we ran into Sam Altman, the CEO of OpenAI, told him about our initiatives and he gave us access to the beta.”
A generative AI like DALL-E works by taking as a reference a text given to it by a humana starting image, or sometimes both, and begins to identify among its references images that fit those requests, in order to later transform them.
Ideami explains that the main differentiating factor of DALL-E is the enormous dataset with which it works (the number of records from which it takes information) and its way of connecting both text and image requests and interweaving them.
The process by which DALL-E 2 ends up generating things as incredible as the images we have seen is really complex, but for Ideami, there is something fundamental as a starting point to understand it. “The similarity with how the human brain works when it comes to remembering is a good starting point. We collect information at a given time, which we store. After a while, we evoke that information in the form of a memory. It will not always be the same, but we will modify it each time. Transferred to AI, part of the information we give it, it searches its dataset, and generates the image it gives us”, he exemplifies.
Getting into the flour, the DALL-E 2 sequence works like this:
- Information is captured: First, a text is entered into an encoder that is trained to assign the text to a particular representation space. Know as well as possible what we are asking for.
- It is searched in its huge bank of ‘memories’: Next, a model called prior maps the text encoding to a corresponding image encoding that captures the semantic information of the message. The AI starts to do match between text and image.
- The image is evoked: Finally, an image decoding model stochastically generates an image that is a visual manifestation of this semantic information that it understands that we have given it.
As Ideami continues to explain, Another determining factor of DALL-E is how it manages to semantically join text and images to generate better images.. That’s where another OpenAI model called CLIP (Contrastive Language-Image Pre-training) comes into play.
CLIP is trained with hundreds of millions of images and their associated subtitles, learning the relationship that a certain piece of text has with an image. That is, instead of trying to predict a caption from an image, CLIP simply learns how a particular caption is related to an image.. This contrastive, rather than predictive, objective allows CLIP to learn the link between the textual and visual representations of the same abstract object.
“CLIP is capable of taking a lot of images and text, working with it in what is called in AI the same latent space, and working with them at a high abstraction level from the start”, explains the engineer.
Finally, it comes into operation decompression of that image to give us the product we seewhich the OpenAI machine does with its own diffusion model, called GLIDE, again optimized.
If at this point you feel more lost than the teddy bears we had left investigating an AI on the Moon during the 80s, perhaps this infographic also made by Ideami will help you:
The stock image industry may be the first to fall for DALL-E 2
Now, what implications can such a technology have? Should creatives, designers or illustrators feel threatened?
Both Ideami and López believe that the stock photo industry is the one that would be the most on the tightrope. Regarding creative activities, they think that it could also force a reformulation, although not necessarily for the worse.
“Over time, and depending on which verticals, I think that some jobs will be completely redefined or will cease to exist as we know it. I mean that in a few months or at most two or three years, when these technologies are somewhat more mature, anyone will be able to generate a high-quality illustration or photograph without needing an illustrator or a photographerLopez argues.
It can also have a direct effect on a market and concept as broad as that of intellectual property. “On the other hand, the fact that these datasets are fed by photographs and illustrations by other artists, may mean that copyright laws have to be reconsidered. Although on the other hand, when a human draws they are also inspired by the works of other artists… although he has that “dataset” in his head, instead of a digital database,” adds López.
“Having tools like this might limit a lot of creation initially, but I think it can also encourage creativity itself. That models like DALL-E 2 are the starting point in brainstorming or to arrive at a concept. Creativity, as we learn more about it, we understand that it is more about combining ideas that arise from it just because”, maintains Ideami, who, however, has also come across some disturbing signs.
“I saw a post on Reddit from a teenager who said he wanted to study art, but after seeing what he was capable of, Dall-E 2 has decided not to. That made me think that you also have to educate a lot in its function as a complement, and not as a supplanter”.