LLM designed packaging is here

I got a snack box on my Alaska flight from Seattle to Salt Lake City, and when I looked at the bag of chips, something seemed off. I could not immediately put my finger on it, so I started eating. And then about halfway through, it dawned on me: I’m pretty sure the packaging was designed by an LLM.

I think this packaging was designed by an LLM.

Often, LLM images have this uncanny quality. It’s hard to explain, but once you see it, you cannot “unsee”. It’s usually a subtle detail that your brain just flags as looking “wrong”.

Even more frequently, the text in the LLM image will be wrong. Sometimes it just outright makes no sense, such as in these tie tying instructions in the style of Hokusai. The numbers are all over the place, and not even sequential. Also, they aren't always legible.

"How to tie a tie in the style of Hokusai", version 1, generated by DALL-E

"How to tie a tie in the style of Hokusai", version 2, generated by DALL-E

At other times, the errors are more subtle. It might be a sneaky typo, or an unexpectedly repeated letter. Either way, for some reason LLMs really struggle with rendering text inside images.

"I O Llove you Anna" Valentine's card, generated by DALL-E at my husband's request

So when I took a closer look at the bag of chips in my hand, I realized that the text written in the script font is not what it’s supposed to be. It looks plausibly close, but if you read carefully, it actually says “Sea Snlt nnd Humuus” (I added the letters in red for reference), not “Sea Salt and Hummus” (as it does on the back).

"Sea Snlt nnd Humuus" on the front (red text was added by me)

"Sea salt and hummus" on the back

Now, why am I so sure this is not just human error? After all, schools in the US had stopped teaching script handwriting for a while, and there are probably some designers who struggle to read script fonts. The designer making this packaging could have had such a struggle (which they would probably be too embarrassed to admit to anyone), and they could have typed the text wrong. However, I think the LLM theory makes more sense once you take a close look at the typos. Why would anyone repeatedly type “n” instead of “a”? Also, if I had to guess, when the text is first typed into whatever graphic design program is used, it probably defaults to a generic sans-serif, and only later the designer adjusts it to match the look they are going for.

Also, consider the fact that the text looks “about right” at a first glance. I bet a person reviewed it quickly, and approved the design without much further thought.

So why do LLMs struggle with "text in images" so much?

I don't think anyone knows the answer, but I have a pet theory based on my inconsistent ability to recognize text in my dreams.

In some dreams, I see and read text, and have no problem understanding it. Sometimes it's a page in a book, and sometimes just a line of text, but it feels pretty much the same as it does when I'm awake.

However, at other times, I see text and recognize it as text, but when I try to understand the meaning, I completely fail. One particular example I remember had a speed limit sign, and I knew it was one, but I could not for the life of my dreaming self figure out what the number on the sign was. And there are many other dreams that I cannot quite recall that ended in me trying, trying, and failing to read some text, and waking up in frustration moments later.

I wonder if LLMs have the same kind of problem, where they can’t quite read text because something in their model is not lined up exactly right, like in a dream. They might be able to figure out the typeface (just like I can in most of those dreams), and get the general idea of what the text might mean based on the context, but lack the ability to parse it. And when they are asked to generate text, what they recall is not exactly right, and so it only kind of makes sense.

However, I think we will continue to see more AI-assisted design, and more examples of this "text glitch" making their way into final products. And honestly I don't think there is anything wrong with using an LLM to help with the initial design if the project is on a tight timeline and has a low budget. It will probably give a better result that a person from one of those "cheap design" websites that I am not going to mention by name. But I do still think it's embarrassing to let the typos sneak into the final product.

And most likely, in the near future, there will be a "split" in the design space. There will be dirt-cheap AI-made designs that are "good enough" (except when they are not), and there will be expensive human-created ones for those who care, or who want to show off their status and wealth.