Text-to-image artificial intelligence programs aren’t anything new. Indeed, existing neural networks like DALL-E have impressed us with their ability to generate simple, photorealistic images from brief yet descriptive sentences.
But this week I was introduced to Imagen (opens in new tab). Developed by Google Research’s Brain Team, Imagen is an AI similar to that of DALL-E and LDM. However, Brain Team’s aim with Imagen is to generate images with a greater level of accuracy and fidelity, using that same short and descriptive sentence method to create them.
An example of such sentences would be – as per demonstrations on the Imagen website – “A photo of a fuzzy panda wearing a cowboy hat and black leather jacket riding a bike on top of a mountain.” That’s quite a mouthful, but the sentence is structured in such a way that the AI can identify each item as its own criteria.
The AI then analyzes each segment of the sentence as a digestible chunk of information and attempts to produce an image as closely related to that sentence as possible. And barring some uncanniness or oddities here and there, Imagen can do this with surprisingly quick and accurate results.
A little too wholesome?
If you’ve checked out Imagen or other neural networks for yourself, then you’ve probably noticed the overwhelming focus on a select few subjects. DALL-E, for example, likes to create images based on everyday household items, like clocks or toilets. Imagen, at least for now, seems to put cute animals at the forefront of its image generation capabilities. But there’s actually a very good reason for this.
Google’s Brain Team doesn’t shy away from the fact that Imagen is keeping things relatively harmless. As part of a rather lengthy disclaimer, the team is well aware that neural networks can be used to generate harmful content like racial stereotypes or push toxic ideologies. Imagen even makes use of a dataset that’s known to contain such inappropriate content.
“While a subset of our training data was filtered to remove noise and undesirable content, such as pornographic imagery and toxic language,” Brain Team notes, “we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.
“Imagen relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models.”
This is also the reason why Google’s Brain Team has no plans to release Imagen for public use, at least until it can develop further ‘safeguards’ to prevent the AI from being used for nefarious purposes. As a result, the preview on the website is limited to just a few handpicked variables.
Ultimately, it’s the right call. There have been examples in the past of AI programs being unleashed onto the online public… with extremely undesirable results. You may remember Microsoft’s Tay, an AI Twitter account brought to the social media platform roughly five years ago.
Tay was a pretty ballsy experiment on Microsoft’s part. Its intention was to see how an AI would react to and interact with real people in a social media environment. However, within hours, Tay went from a wholesome chatbot to a dispenser of anti-semitic talking points. This was despite the bot being “modeled, cleaned and filtered” according to Microsoft (thanks, The Verge).
Given the precedent set by AI like Tay, then, it’s easy to see why Imagen has been reigned in. Clearly, even extensive filtering might not be enough.
Still far from perfect
While I was immensely impressed by Imagen, and had a lot of fun mixing and matching sentences to create all kinds of bizarre pictures, it’s definitely not something I’d consider to be overwhelmingly convincing. At least not for the time being.
More often than not, Imagen returned some frighteningly hilarious results. Animals, in particular, often appeared with all kinds of wacky proportions. Seeing a raccoon with a massive head, or human-like girthy arms gripping a bike’s handlebars was a pretty common sight. While very funny, these peculiar results blended with the photorealism often churned out disturbingly uncanny results.
The option to generate an oil painting was actually a good deal more convincing, and most of what Imagen was able to produce here wouldn’t look out of place in a school project. And I mean that in the nicest possible way. As it turns out, a Persian cat strumming a guitar translates far more convincingly to a painting than it does a realistic photo.
As noted, it’s highly likely we won’t get a public release of Imagen anytime soon. Or ever, for that matter. The risks posed by AI programs and neural networks being able to generate unsavory content are still far too great. For now, though, I’m content with Imagen being a fun little curio for those looking to spend a bit of time generating funny cowboy hat-wearing animals skateboarding down a mountain.