What does Looking at Imagery with text-based AI Tell us about Creativity?

Backdrop

So I had an interesting morning which involved working with AI and I'm kind of excited about it, so, figured it was "blog-worthy". First, a bit of background. I've had long-standing interests in many things, two of which include scalable vector graphics (SVG) for illustration and, of course, artificial intelligence. Lately, I've been working on my long neglected personal website which has been sorely in need of an update for many years, and using the hot new LLM's (Large Language Models) to flesh out some technical details.

The fascinating thing about LLM's is that the architectural principle on which they're built is exquisitely simple. Essentially it boils down to a probabilistic model that predicts possible continuations given a prompt-generated discourse context. Such systems appear to be exhibiting intelligent linguistic behavior based on nothing more than an "educated guess" as to how to continue the next sentence fragment of a discourse. The key to understanding how these systems work is that the output is not deterministic -- it's not generated by explicit rules expressed in program code. Instead, it's a stochastic process enabling the system to learn.

Let's pause for a moment and consider just what this implies. I've had a long-standing interest in the nature of consciousness going back to my adolescence. In college I made it a formal study. But after reading Plato, Aristotle, Descartes and all the rest, and Dennett's Consciousnesses Explained, I was left with the vague dissatisfaction that, really, very little in these works actually explain consciousness at all. It wasn't until I got to grad school and started working with the then nascent mathematics of neural networks that I really started to understand the implications of the building blocks of consciousnesses.

Fast-forward to the present and the stunning rise of applied AI. Maybe I should say re-emergence. Broadly speaking, research into AI has gone through many incarnations since it's inception. Here I'm referring to the application of neural-network architectures and, more recently, transformers which have completely disrupted how we think and do business in the information age.

The implications of so many discussions around AI bubbling up into public awareness, are that these systems have crossed a tipping point. Simple mathematical principles applied in the context of exponentially increasing computational power have given rise to systems exhibiting emergent properties which we might call creativity in people. For the moment, anyway, let's limit the scope of the discourse to creativity.

Putting Creativity to the Test

So given that backdrop, I wanted to go ahead and put the creativity of these systems to the test. Historically, philosophers and scientists have long debated whether computers, with all their proficiency at number crunching, could ever be creative in the way that humans can. For example, in 1950 Turing advanced the concept of the behavioral test for computer intelligence. John Searle later proposed the "Chinese Room Argument" to suggest that computers, even if it seems like they do, can't really understand language.

So I decided to conduct a little experiment of my own. One of my site updates includes an SVG self portrait (as I said I have a long-standing interest in using SVG for illustration). As I was working on the image it occurred to me that what I'd drawn could use some more pizzazz. That got me to thinking; everybody's using AI these days to augment their prose. But what about imagery? Yes -- I'm aware that there are numerous generative AI applications that generate imagery (shameless self-promotion -- more posts on that coming soon), but I was curious about the text-based applications I've been using lately, namely ChatGPT and Google's Gemini. Also, the image-based generative AI systems don't operate on SVG.

So I set about to try an "experiment" with Gemini. The conversation went something like this:

Nick: Wanna try an experiment?

Gemini: Absolutely! I'm ready to experiment. Let's see what we can discover. What do you have in mind?

Nick: I'll send you my self portrait in SVG and I want you to try to make it more interesting.

Gemini: Let's do it! I'm excited to see what kind of transformations we can achieve with your self-portrait. Send it over when you're ready, and we'll explore some creative options.

Sadly, when I tried sending the SVG, it ended up breaking Gemini. The system just hung there and spun causing me to have to bounce my browser.

Despite the initial fail, we can still make some interesting observations. By now, it's pretty well known that the dominant LLM's on the market, while really good at generating some types of discourse (given sufficiently detailed prompts), often are prone to make wild assertions while sounding extremely confident. The phenomenon's been deemed "hallucination". When I first encountered these types of assertions I thought of them as (unapologetically tongue-in-cheek) "lies". The point is that these systems don't appear quite ready to differentiate between education and deception/misrepresentation (perhaps an argument for the "absense of understanding" side of the debate).

Still I was intrigued. All this trying to get a text-based system to draw in SVG recalled to mind an NPR story I heard a while back. It was a segment on "This American Life" where David Kestenbaum was interviewing a Microsoft engineer working on ChatGPT around the time of its big public release. He was very excited about the insights that were emerging based on interactions with the system. Part of the interview included a discussion of how the engineer hit on the notion of testing whether ChatGPT 4 could "draw". Given that the text-based LLM's can't really draw per se, the engineer took the same approach. He tried to get ChatGPT to draw a unicorn using TikZ (a LaTeX package used to create vector graphics -- kind of similar to SVG).

So I fired up ChatGPT and asked it to draw a unicorn in SVG. And this is what I got...

It's actually not too different from what the engineer described the as the TikZ output. The system attempted to portray a unicorn using shapes, paths and colors available to it in the mathematical markup language it could use to generate its output. Does that imply we can say, "This is what ChatGPT 'thinks' a unicorn is"?

My Key Insights

So what can we conclude from these little experiments? I mean, to me, the drawing's pretty lame. 'Looks more like a pig than a unicorn. And if you ask ChatGPT to get more creative it pretty much gives you back only slight variations on the theme. Same shapes, same colors, same unicorn features. I was hoping for something more like this...

Or even this ...

Still, what's interesting to me about all this is what emerges from these systems based on nothing more than base variations on connectionist architectures and a few variations on activation rules, loss functions and optimizers. What's neat to me is how, given these atomic building blocks, the system at least appears to have developed an internal representation, a mental model if you will, of what a unicorn is supposed to be. The system has never explicitly been programmed or "told" how to draw a unicorn. And yet it seems to be creative enough to express its "understanding" using the languages at it's disposal.

So does all this amount to a definitive answer to the creativity question? For now I'll leave it to the reader to decide. But what's most certain is the debate over the emergent properties of creativity in automated information processing systems has never been more salient. Researchers and practitioners involved in the creation of AI systems have identified stages of AI development ranging from narrow to general to super. I've also heard a lot of (rightful) concern around the ethics surrounding the deployment of AI applications. Many artists and creative types express grave concerns over their potential displacement by creative (general?) AI systems.

All that being said, I don't think it's debatable that we are very deep into the early stages of the emergence of general artificial intelligence with everything that that implies. I firmly believe that we are well down the road to understanding how to architect and create systems capable of general intelligence. But beyond that, I feel that exploring and understanding the internal representations of such systems can provide valuable knowledge and insights leading to deeper understanding of the nature of our own awareness. And while I completely acknowledge the need to get ahead of the eight-ball with regard to the ethical deployment and utilization of these systems, I, for one, am keen to continue the exploration!

Resources

"Greetings, People of Earth." This American Life. WBEZ Chicago, 23 June 2023.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417-457.
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460.

Special Thanks

Immense gratitude to OpenClipart-Vectors on pixabay for open use of the human generated unicorn art.