Text-to-image software: a piano or a digital camera?
Some technologies make it possible for extremely talented people to create new forms of expression. Are we witnessing one?
Part of the appeal of art is that not everyone can make it. A snowman or a stick figure on paper are generally not considered art. There was a time in which a canvas representing a tree was extremely valuable because there were few people capable of producing it. Same with a large picture of a beautiful landscape. Now those things are commodities. There are some valuable pieces like that for historical reasons. If they were made today by a random person we would not pay attention to them, there must be billions of images in the world that someone from the year 1500 would find more appealing than the Mona Lisa.
When a new technology for creating content appears, it creates a range of possibilities. If the range spreads humans so that there are only a few at the top, then it creates a new form of art. Imagine the piano, for example. There must be tens of millions of people in the world capable of playing a simple melody. However, only a tiny fraction of them can flawlessly perform the greatest piano works ever written, over and over again, in front of audiences.
A digital camera on a phone is at the other end of the spectrum. Practically anyone can take a picture that might have required an expert photographer and expensive gear a few decades ago. Many people have the aesthetic sense to choose pictures that others will find pleasing. What makes an Instagram post popular is rarely the quality of the image; it’s generally the story behind it. A picture of a glacier doesn’t mean much if I post it, but it Alex Honnold tells you that he climbed it then it has some meaning. You cannot tell it’s him, you cannot even tell if the picture is real. The art here (if any) is in the fact that a climber is conveying the fact that he did something difficult in a beautiful location.
In 2022 there has been an explosion in tools to generate images. Some of them make it extremely easy for a random person to turn a sentence or two into a beautiful image. The question to me is, are these tools more like a piano or like a camera phone? I suspect it’s the latter. The Midjourney pictures that were impressive a month ago now seem tired and boring to me. I see them everywhere online now. They are pretty, in the same way that your picture of a tourist landmark is. There is nothing wrong with it, but it doesn’t stand out.
It’s possible that future incarnations of these technologies will be more like the piano. There is nothing stopping us from using our phones to create a feature film. Almost nobody does it, because the hard part of making a film is not the shooting itself. It’s about coming up with a story worth telling in a couple of hours, and then managing to tell it in an entertaining way with sights and sounds. For now, I believe these tools are less revolutionary than they appear. Here are some predictions for the next few years, curious how I will do:
The craze will subside, and text-to-image will fade into the background like Instagram filters and FaceApp.
Most of us will use it for memes or random illustrations (blog posts, conferences, posters).
Occasionally someone will create a graphic novel or animation using these tools, and the interesting part will not be the process. The work will have to be entertaining just like any other form of animation.
There will be consulting and mundane tools for studios and professionals that will decrease the production costs. For example, you could build a model for a TV show with dozens of characters, and have a writer generate storyboards or short scenes featuring the characters and sets. You’ll be able to alter shots more easily, change textures, add or remove objects.
I cannot imagine a completely unforeseen application that will create a new industry in the next five years. A wizard that generates educational videos? Fake movies created from memories you describe?
During the past few decades there has been huge progress towards removing obstacles in expressing ideas. The flipside of this is that now we are flooded by those expressions everywhere online. What we cannot automate (yet) is the creativity required to come up with the truly interesting insights that stand out. There are billions of TikTok / Instagram / Youtube posts. Many millions of Midjourney / DALL-E / Stable Diffusion images made every day. What fraction of them express truly interesting or memorable ideas?