Google has demonstrated on many occasions what its different machine learning algorithms, such as MUM or LaMDA, are capable of, and continues to reflect those advances with a new artificial intelligence model called ‘Image’. This, according to Jeff Dean, head of the company’s AI division, promises to “release joint creativity between humans and computers”, and is capable of generating images based on a simple and brief text description.
‘Image’ is very similar to DALL-E 2, the artificial intelligence developed by Open AI (a company founded by Elon Musk) that also allows images to be generated based on a text description. However, there are several differences between the two models, such as the level of detail and efficiency in creating that image.
Google, in particular, ensures that its AI offers results with a much more precise level of detail compared to other systems. To prove this, the company created a benchmark called DrawBench, which compares its AI model with similar ones, such as VQ-GAN+CLIP, Latent Diffusion Models, or even DALL-E 2, and exposed the results “side by side”. side” so that “human evaluators” can differentiate between them and choose the most realistic. These evaluators, according to the company, concluded that the images generated by ‘Image’ have a higher quality and a better “image-text alignment” compared to the rest of the models.
Google AI is faster and more efficient than others, it also understands more complex descriptions
‘Image’, Google’s AI that generates images from a short text description, is also “more computationally efficient, more memory efficient and converge faster” thanks to a proprietary architecture called U-Net. The results, therefore, are hyper-realistic images generated more precisely than any other model and from any type of text description.
“An extremely angry bird”, “a photo of a raccoon wearing an astronaut helmet, looking out the window at night” or “a brain riding a spaceship towards the moon”, are phrases that Google has used as examples to demonstrate what your AI model is capable of. These are some that we can find on your website.
Google, on the other hand, claims that ‘Image’ can also create images with descriptions based on specific places or even convoluted texts. For example, if the user types “A Procyon lotor (raccoon) proposing to a Phascolarctos cinereus (koala) at DisneyLand”, the company’s AI should create an image based on this description and understand the scientific names of both animals, as well as the place.
‘Image’ at the moment is an internal project and not available to the publicas it can lead to the creation of images that contain “stereotypes and harmful representations”, highlights the company.
‘Image’ is based on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large linguistic models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision not to release Imagen for public use without further safeguards.
Google.