There’s been a lot of excitement recently about AI tools in general, and large language models like ChatGPT in particular. I’ve been thinking a little bit about when statistical models perform well and badly, and what that might suggest about the future capabilities of AI.
All models are wrong
The best place to start is with the word “model”. It’s been extremely controversial lately, with huge arguments raging about COVID - are these models giving a scenario or a prediction, how seriously should we take the outputs and so on? But at heart a model is a way of taking a certain amount of training data (things that we know), and trying to use them to make guesses or predictions about the values of test data (things that we don’t know).
Obviously these guesses are never going to be perfect, because we don’t have a crystal ball and because the real world is a messy place. But we’d hope that the guesses should at least look plausible, and so we’d like to understand when this may or may not be the case.
The absolute simplest model in statistics is the linear model. You’ve seen it hundreds of times - every time someone plots a scatterplot of data on x and y axes and draws a line through it, they are making a model. The model in question says that the data points roughly lie on a line - so increasing the x value by a set amount will roughly lead to the same increase in y each time.
Passing the exam
Helen is preparing for her GCSEs, and would like to know how much to revise for each one. She spent varying amounts of time revising for each mock exam, and she plotted her results on a graph, which looked like this. (Luckily she was good at maths, and so knew how to plot the red line corresponding to the outcome of the best linear model!)
It seems like Helen’s revision has paid off. The more time she spent on a subject, the better she did (the line slopes up). Further, we can use this to make sensible guesses about how she might perform with small increases in work - for example, if she increased from 10 hours to 15 on a particular subject, then she might expect to gain 10 or so extra marks in it.
However, it should be clear there are limits to this. For example, the output of the model suggests that if Helen did more than 30 hours work then she might score more than 100 per cent - which is clearly ridiculous! In fact, predictions for very large and very small numbers of hours should be taken with a pinch of salt.
I know this, because I generated the data myself. I took a known straight line relationship and added noise to it to generate the points. As a result, I know that in fact the real values should lie somewhere around the blue line.
This clearly illustrates the key point: models are often very good at interpolating known data, but not so good at extrapolating from it. For the kind of values (10-20 hours) for which we had training data points, the predictions (red) of the model come close to the underlying true values (blue). But the fact that the slope of the red line is wrong leads to big errors far away from that range. As I describe in Numbercrunch, it is somewhat like trying to position a long ladder from very far away.
In fact the position is worse than that. There’s no reason to think that true real world data will follow a straight line relationship like the numbers I generated. (In fact in this case at least the ceiling of 100 per cent gives us a good reason to think that it won’t - instead we’d expect there’d be some kind of Law of Diminishing Returns). If we try to fit a straight line to data which is based on a curve, then things could go very wrong far away from our training points.
Raising the stakes
Of course, Helen’s model is a very simplistic one. In this case, we only have a single numerical input - the number of hours studied - and so it is clear what it means to vary that. It’s intuitive that 14 hours is about halfway between 13 and 15 hours, for example.
However, the kind of AI models we are talking about involve a vastly more complicated set of inputs. Instead of walking along a (one-dimensional) straight line marked by hours, we might imagine ourselves in a world of billions of input dimensions.
Most of us struggle to visualise things even in three dimensions, but luckily there are mathematical tricks that can help here, based on understanding geometry and distance in this super-complicated world. In particular, it makes sense to talk about moving halfway between two points here too. If you want to imagine it all though, perhaps think of a vast cloud of points - say representing all the works of art, music and literature, with creations in a particular style tending to cluster together.
AI, AI … oh
This geometrical representation underlies the way that the current generation of AI models perform many of their dazzling party tricks. It’s striking that the kind of scenarios where they excel are often interpolation problems (“draw Ninja Turtle Michelangelo in the style of artist Michelangelo” — see the image at the top of the post, “write a summary of Partygate in the style of Jane Austen”, “show me the UK Cabinet as superheroes” and so on).
At heart, the model has a lot of training data about what Ninja Turtles look like, and a lot of training data about Renaissance art. What it can do is efficiently interpolate between these, to find something that looks halfway between the two ideas of Michelangelo. As we’ve seen from Helen’s problem, this kind of interpolation is what models excel at, and the results can be great (even if the image I created probably isn’t in this category!)
In this sense, generative AI models can efficiently explore our vast cloud of points, creating artworks that lie in a range of mashups of existing styles, based on known prompts. However, they are more likely to struggle to extrapolate, to get outside this cloud and create genuinely new things outside their training set.
Indeed, it’s worth remembering that many of the greatest works of art are those which somehow widen the envelope of existing creations and develop new styles. Think of Mary Shelley creating the genre of science fiction with Frankenstein, Beethoven writing a longer symphony with voices, or Marcel Duchamp putting a urinal in an art gallery. These kinds of steps made the cloud of possibilities bigger, and expanded it in an interesting new direction. It’s hard to see how AI models could do this, since they tend to stay safe within their existing corpus of training data.
Think of large language models for example: on some level they generate text by guessing the most likely next word, or phrase, or sentence. In that way they can produce what seems like pretty good writing. But great writers aren’t bound by the conventional rules, or by playing the odds. Think of James Joyce writing Molly Bloom’s soliloquy (a sentence of 3,687 words), or inventing the word “quark”. An AI wouldn’t do that: that’s not how existing English looks to it.
Of course, it’s not like human-made art is always blindingly original either. Think of the endless array of reboots and sequels that currently dominate the cinema box office, or a vast array of identikit novels with spaceships or wizards on the cover. It’s possible that AI can generate many of these at least as efficiently as humans can.
But because they are bound by their training data, it seems much more unlikely that an AI would ever write Ulysses, invent cubism, direct Citizen Kane or write Like a Rolling Stone. And that’s fine! Most of us won’t ever do that either! It’s setting the bar pretty high! But if we were to move into a world of interchangeable purely AI-generated art, then I think we’d lose something. We need our mavericks, our genuinely original creators, our extrapolators to keep pushing the boundaries, or we will end up trapped in the existing region of “things that we’ve seen before” forever.
And for the foreseeable future I think these genuine instigators are going to continue to be made of flesh-and-blood.
Housekeeping notice
While I’m off Twitter, I’m not going to be able to share this as easily, so please consider tweeting or sharing it!
It’s great to see more of my favourite people showing up on Substack rather than tolerate terrible Twitter Blue posts being shoved down their throats. My list of Substack recommendations continues to grow, as Notes is proving to be a great way to find more great writers providing interesting (often free) content. Why not take a look and see who else you might like to follow?
In particular, a huge welcome to COVID data legend Paul Mainwood! I strongly encourage you to sign up to his brand new newsletter
which my model tells me will be excellent.
Twitter is disintegrating but not sure I've got enough to fill a substack with, or rather I do, but some people say it puts them off their dinner
*adds mainwood*