In an age where AI is again the focus of the tech world, Google has come up with its text-ti-image AI generator that can provide you with images based on the text input. It’s the Imagen AI system, which is created by the Google Brain team, and if Google and the bunch of sample images are to be believed, it can generate “photorealistic images and deep level of language understanding.” Here’s a look at the details.
Here’s What Imagen AI Can Do!
As the name suggests, the job isn’t difficult. All you need to do is type what you want to see and based on its understanding after reading loads of data, Imagen will generate an image for you.
The Imagen website showcases some use cases and what we see is quite impressive. Imagen combines large transformer language models in understanding text and diffusion models to create high-quality images.
The outputs appear quite accurate and give a tough competition to other text-to-image AI models like OpenAI’s popular DALL-E (which even has a successor), VQ-GAN+CLIP, and Latent Diffusion Models. Google even has proof. It has introduced a benchmark tool called DrawBench for this and its data perceive Imagen as the better one.
Google also reveals that on COCO, Imagen was able to achieve a COCO FID of 7.27 and human raters have found the results “on par with the reference images.”
But you should know that the sample images provided by such AI systems are often the ones that are deemed the best and the ones that go awry remain well under behind the curtains. So, to consider Google’s AI model the best can be too early.
The AI model also has its set of caveats, which Google doesn’t refrain from highlighting. The AI can be used as a tool for malicious activities like the creation of derogatory content or fake images and hence, it still isn’t available for people to try out. Plus, AI can be prone to various social biases.
The Imagen website reads, “Imagen exhibits serious limitations when generating images depicting people. Our human evaluations found Imagen obtains significantly higher preference rates when evaluated on images that do not portray people, indicating degradation in image fidelity. The preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes.“
Therefore, it would be safe to say that Imagen still needs some work to be able to work properly. Nonetheless, for the fun part, Imagen feels like a pretty good choice and if you intend to see anything goofy and unreal, maybe, Imagen can help. What are your thoughts on Google’s text-to-image AI? Let us know in the comments below.