AI image generators have created a lot of buzz recently, but they may be hard to understand. Here’s everything you need to know about them.
In 2022, we've seen the advent of some incredible text-to-image generators. The first to set off the big wave was Dall-E 2, with Stable Diffusion arriving a short while later. Since then, we've seen other tools arrive too, including Midjourney, Craiyon, and even TikTok to a certain degree. There are growing concerns when it comes to AI image-generating tools, pertaining primarily to the ethics of such tools when they can generate images of real people in places or situations that they were not actually in.
However, there aren't only ethics to consider, either. AI image generators are trained on millions and millions of photos and have learned to identify things by way of actual existing photos created by real people. When does it become a copyright violation? If your AI accidentally generates an image that looks very similar to another design, and the creator of that image goes on to share it commercially, is someone liable for any damages? If so, who? Who is even the "artist" in this case?
There are a ton of reasons to be wary of AI image generators, and these ethical and safety concerns are merely scratching the surface. These tools can be used to create fake images that can be used to push a narrative, and they'll only get worse with time as well. Given the incredible capabilities of these image generation tools already, it's scary to think what they'll be capable of doing very soon. However, if you want to make pretty images and have some fun, then there's absolutely no harm in that.
Stable Diffusion is the inspiration behind this article and a tool that I've been playing around with a lot recently. It runs locally on your computer (so you're not fighting for resources with other users of some online tool) and it's one of the most powerful that you can currently use. Not only does it allow you to fine-tune a ton of parameters, but you can also control the entire generation process.
Stable Diffusion suffers from all of the same AI pitfalls, with the added "danger" of accessibility. Anyone with a powerful enough computer can set it up and have it running quickly. With an i7-12700KF, an RTX 3080, 32GB of RAM, and gigabit internet, I was able to set up Stable Diffusion and generate my first images within an hour. My PC is definitely on the higher end, but you can get away with running it on weaker hardware (though you can't generate as large images with lower vRAM and it'll take longer).
The best thing about Stable Diffusion is that it's entirely open source. You can implement support for it in any of your projects today if you want to, and there are already plugins such as Alpaca that you can use to integrate with Photoshop. It's not perfect yet, but it's extremely early in the development of these programs. You can use Dream Studio either if you'd like, though that costs money and is a bit restrictive versus setting it up locally.
What's more, if you set up Stable Diffusion locally, there are forks such as AUTOMATIC1111's Stable Diffusion WebUI that come with a built-in upscale tool that can increase the resolution up to four times higher. While you can generate images at higher resolutions, it is often much quicker to generate an image at a lower resolution and then upscale it. All of the images below are upscaled from smaller resolutions.
Stable Diffusion was trained on a cluster of 4,000 Nvidia A100 GPUs running in AWS and took place over a month. It has the ability to generate images of celebrities and has a built-in NSFW filter as well. You can disable this NSFW filter on local installations, as it actually saves on resources by decreasing VRAM usage. As for what "Diffusion" means, it's the process of starting with pure noise and refining over time. It makes the image incrementally closer to the text prompt over time until no noise is left. This is the same way that Dall-E 2 works.
Finally, another fun feature that Stable Diffusion has is "img2img". In this, you give it an image as a prompt, describe what you want the image to be, and then let it give you a proper drawing.
I gave it a template to work with and got back a pretty decent image. I'm sure with better prompts (mine is somewhat contradictory), you could get even better. Still, not bad at all for something that took me about five minutes to make.
In short, Stable Diffusion is free, easy to set up, and the biggest issue is how accessible it is. If you don't have a powerful enough PC, you'll need to pay to use this through the likes of Dream Studio.
Craiyon was previously known as DALL·E Mini, though despite the name, is of no relation to Dall-E 2. It was created in order to reproduce the results of OpenAI's DALL·E text-to-image model. Craiyon is available to the public and can be used to generate images that are surprisingly decent, though the images aren't as accurate, nor are they as high-quality. Image resolutions max out at 256×256, and there are no upscaling tools, either.
Craiyon is completely free to use and accessible through its website. You can generate any image via any prompt, and the only catch is that the images are lower quality and that you'll need to wait two minutes or so for each batch of images generated. Craiyon started as an open-source model aimed at reproducing the results of the initial DALL·E model. The model now being used is known as DALL·E Mega, and it packs several improvements.
Craiyon, unlike the other options here, is supported by advertisement revenue. As a result, you'll see paid sponsorships and other advertisements on their website when you visit. There is also an app for Android smartphones. It's not the most sophisticated, but it's fun, easy to use, and accessible.
Dall-E 2 is a product of the OpenAI research lab and is the most well-known AI image generator that people think of. It's a closed-off tool with limited access, but for those that can access it, some of the results that it can come up with are incredible. It was initially closed off due to concerns surrounding the ethics and safety of such a tool, though it has expanded gradually over time.
One of the biggest advantages that Dall-E 2 has is the ability to create photorealistic images that, at a glance, are indiscernible from real photographs. It can generate paintings, images that look to have been captured on real cameras, and entirely made-up scenarios. It represented a huge jump in the capabilities of AI when it was first announced, both in its abilities to make images and in its Natural Language Processing, known as NLP. This is thanks to its implementation of GPT-3, which is one of the most advanced language models out there and is also authored by OpenAI.
Just like with Stable Diffusion, Dall-E 2 also has its own ability to take existing images and modify them based on a prompt. You can edit photos through it by asking it to add something to an image, or even ask it to remove something or to change the lighting. While it only creates square images, OpenAI announced Outpainting last month that can expand your images wider, taking into account the context of what's already available in your square image.
Dall-E 2 is available for all to try out.
Midjourney is an interesting one as it's a public platform that can generate images, though you do it through a Discord server. Not only that, but after you generate 25 images, you'll need to subscribe to the service to continue generating new ones.
While Midjourney is probably the most accessible platform here (given you can access it from any device with a Discord account), it also costs you money. However, you do get quality out of it. A user of the service, Jason Allen, created a piece that he dubbed "Théâtre D'opéra Spatial". He entered it into the Colorado State Fair art competition… and won.
Unlike these other projects, Midjourney is a proprietary artificial intelligence program. There is no source code that you can look at, and its entire purpose at this point in time is limited to usage within a Discord server. As for why it's a Discord server only, David Holz, founder of Midjourney, said the following to The Verge in an interview.
We started off testing the raw technology in September last year, and we were immediately finding really different things. We found very quickly that most people don’t know what they want. You say: “Here’s a machine you can imagine anything with it — what do you want?” And they go: “dog.” And you go “really?” and they go “pink dog.” So you give them a picture of a dog, and they go “okay” and then go do something else.
Whereas if you put them in a group, they’ll go “dog” and someone else will go “space dog” and someone else will go “Aztec space dog,” and then all of a sudden, people understand the possibilities, and you’re creating this augmented imagination — an environment where people can learn and play with this new capacity. So we found that people really like imagining together, and so we made [Midjourney] social.
Back then, you also would have trouble steering it away from the default "Midjourney" style, so to say. That's according to Holz, anyway, in the same interview.
[W]e have a default style and look, and it’s artistic and beautiful, and it’s hard to push [the model] away from that.
However, since then, the company has rolled out two new models — "test" and "testp". "test" is a general purpose model, and "testp" is focused solely on photorealism. As a result, you'll be able to get away from that more default look and generate images of more types if you'd like.
AI-generated art, while cool, imposes a number of dangers on society at large. In an age where it can be hard to tell at times when the news is taken out of context or straight-up fabricated, there comes a danger when images can be made in a matter of minutes that look and feel real. For example, take a look at the photos that I generated below. One was generated using Stable Diffusion, and the other was generated with Craiyon.
Prompt: "crashed UFO at Roswell, 1947, lighting, army general investigating, studio lighting"
The above photos depict a crashed UFO at Roswell and the first image shows what looks like a person walking on top of the crashed UFO. While the image here was generated for the purpose of showing a fake photo, it looks like it could be real. Any artifacts can be explained away by the fact that photos in 1947 would have been of a poorer quality anyway, and both images could pass the eye test at a quick glance at being real. You don't even need one of the best computers to do something like this, as Craiyon is a free application.
Where it gets even murkier is that you can actually specify an artist that you want the algorithm to take inspiration from. A common artist is Greg Rutkowski, who has spoken outwardly against the usage of his name in AI-generated art. His name ranks as one of the most common prompts used in image generation. “A.I. should exclude living artists from its database,” Rutkowski told artnet in an interview, “focus on works under the public domain.” Searching Rutkowski's name will often return AI art that's been generated to look like his work but isn't actually his work.
Even worse is that AI-generated art can often highlight the biases of the human race. Craiyon even has a warning at the bottom of its home page in the FAQ, stating that "because the model was trained on unfiltered data from the Internet, it may generate images that contain harmful stereotypes." As a result, entering prompts such as "company executive" will most often return images of white men in suits. Likewise, entering "teacher" as a prompt will almost always return women in classrooms.
Given that it appears the industry isn't slowing (and regulation isn't catching up) we expect to see even more advancement in these areas. The fact that we've gone from the capabilities of Dall-E 2 (even if it was private) to Stable Diffusion in just a few months shows how big an industry this is, and how big of an industry it can potentially be. Images that could previously have been contracted to a team of artists can now be generated in seconds, with a single artist instead involved in the process for correctional purposes. We've already seen how Midjourney can help win you an art competition, for example, though the U.S. Copyright Office currently says that you can't even copyright AI-generated images.
As Holz also stated in his interview, the current cost of training each model is around $50,000 — or more. Images also cost money as they are generated on incredibly beefy servers, especially when huge numbers of users come to generate their own images. It's going to be massively cost-prohibitive for any new players entering the space, which may in turn actually put some companies off as well. However, initial efforts such as Stable Diffusion being open source do bode well.
As a result, we'll be waiting excitedly to see the future of AI images. The space has evolved so quickly in the last year, and it seems that new advancements are being made daily. However, with glimpses of AI-based image manipulation even coming to our smartphones, there's a lot that could happen in the next year or two.
An Irish technology fanatic with a BSc in Computer Science. Lover of smartphones, cybersecurity, and Counter-Strike. You can contact me at, on Twitter as @AdamConwayIE, or Instagram as adamc.99.