A few days ago, I was wandering through a parking lot when I noticed a very bright and boldly lettered bumper sticker: “My therapist is furry and has four legs.” Naturally I wondered what a furry and four-legged shrink might look like, so I asked ChatGPT for help. The tool generated this result:
A few thoughts here. First, this creature is terrifying, but he’s also into rare books, so we might get along. Second, I'm not sure why the patient is shirtless, shoeless, and wearing boardshorts. Finally, the monstrous therapist looks hungry, and could be wondering whether eating his patient would affect his Google ratings.
Creative Acceleration
The speed at which these tools can take you from odd idea to expression is incredible. I generated this image in the parking lot, on my phone, within a minute of reading that bumper sticker. My request was translated, hurried to the nearest cloud region for processing, then sent back as a finished product within seconds.
Back in August, while driving through Ohio, I noticed a jar of pickled beet eggs on the shelf of a gas station. I wondered, “Hey, what if a beet could lay eggs?”
And so I give you the rare but delicious beet-chicken.
How it Works
Although I tend to have fun with these tools, they are justifiably controversial, and have made quite a few artists very upset. The way they work is also strange. To learn how to create an illustration of a chicken, for instance, the models need to start with a whole bunch of pictures of the birds. These images must have labels — the word “chicken” — attached to them so the model knows to associate elements of the image with that term.
During training, the model gradually makes a mess of each image, altering the pixels and tracking the changes while doing so, until it has created a completely random mix of pixels with no discernible image hiding inside. Why make the mess? Once the model does this enough times, with enough images, it starts to notice patterns in those random, altered images. And it can connect those chaotic pictures back to the original, clear pictures with chickens.
At that point, the model is able to reverse the process, basically starting from a blank page and transforming it into a clear picture like the one above. There’s a lot of variability here, so the better your instructions, the more control you have over the final image.
Rapid Evolution
The improvements made since these models were initially released are unbelievable. In April of 2023, when I was writing an essay about my shabby clothes, I asked one of the early tools to produce an image of a writer on a sidewalk wearing a corduroy jacket. Then I tested that prompt again 18 months later. Look at the difference:
My friend Jet would insist the gentleman on the left looks a lot more like me, but I prefer the one the right, in part because I’m clearly writing some profound poetry, and might be starring in a Hallmark time travel movie.
Ideation
One final example. A while back, I asked the model to generate a Norman-Rockwell-style painting of a family sitting around a kitchen table, with the mother using a gasoline nozzle to pour milk into a pitcher. The first version was weird and sort of useless. So I tried again this week.
ChatGPT informed me that my prompt violated its content policy because Norman Rockwell’s work is copyrighted. (Which is progress!) So I requested a 1950s-style painting, and very quickly received this:
I’m worried about the little boy — although it’s good to see him refusing to conform to gender stereotypes — because something is definitely bothering him. Is it the gas nozzle? Or is he the only one who knows that his sister, who is staring at you right now, has been going around strangling all the pets in their neighborhood in the middle of the night?
Also, I’m convinced I’ve seen that smiling, one-eyed dad in an old advertisement: “Hey, kids, it's never too early to try Marlboros!”
The only major miss here is the fact that the milk isn’t flowing out of the nozzle as requested. I edited my prompt and asked the model to try again, but it spat back a different family, with the mom firing the milk straight up in the air. The AI tool doesn’t understand the laws of the world — it’s rearranging pixels — so it doesn't know that gravity would pull that liquid down.
Yet these tools have become truly exceptional in just a few years. This last image in particular actually means something to me because it’s an idea I’ve been thinking about for twenty years. When I was working at Popular Science magazine, I thought it would be interesting to create a photo shoot in which people were showering or watering the garden or pouring tea, only you’d swap out the showerhead, hose, or kettle for a gasoline nozzle to show how modern life is powered by fossil fuels. The one photographer I spoke to about it wasn’t all that interested or piqued by the idea. So it has just been sitting in my head for twenty years.
Now it’s here.
Distorted, creepy, a little off, but real.
That’s part of the magic of these tools. They're a new kind of paintbrush that an unskilled visual artist like me can manipulate with words. I could take this image and give it to a photographer as a rough illustration of the concept I'm trying to convey. Or maybe I’ll use it to make a really, really weird holiday card for friends, family members, and clients.
Yes, that feels right. And the inside would read:
Happy Holidays. May you have a wonderful time with family and friends. Just don't forget to lock your doors at night. She knows when you are sleeping.
-Mone
(Please click the heart below if you liked this post, so I know what’s working, and I really hope you do enjoy your holidays.)