We are all slaves to The Algorithm. We spend our lives shopping for suggested products online, streaming the TV shows from our recommendations, scrolling through personalised timelines of photos and opinions, and having phone calls about insurance quotes with a voice that we can’t be sure is even human. In all of these settings, our choices are nudged, shaped and influenced by a computer following its own rules. In fact, the person who wrote the computer program may not even know what the rules are or how they are formed. We tend to try to not think too much about this, perhaps for fear of terrifying ourselves.
As a result, many people don’t even know what The Algorithm is doing. It feels like a question of engineering or computer science, and of course to some extent it is. But it’s also a question of maths, and of geography.
Think about a map like this one of Southern England and Wales:
It represents the geographical reality of the world we live in. Of course, Earth is a curved sphere1 and this is appearing on your flat screen, but nonetheless, on some level it captures what the country is like. Points that are closer together on the map are closer together in real life.
But more than that, because geography makes decisions for us about our lifestyles, people living in nearby locations will have somewhat similar experiences. People from Bristol and Newport are more likely to shop in the same big shopping centres, work in the same offices, go on holiday from the same airport, and eat at the same restaurants than say Bristolians and Londoners do.
So if I have a big list of people and their tastes, and someone comes to me and asks for a restaurant recommendation, a simple idea would be that I find out who on my list lives closest to them, and send them to that person’s favourite restaurant. Simple!
Except there’s a problem. Maybe I’ve just sent an 80-year-old who is set in their ways to the pop-up bibimbap place where you order on an iPad, just because their 25-year-old next door neighbour loves it. Maybe I sent a vegan to a steakhouse. All of a sudden, I’m not popular.
The thing is that, while geographical proximity does influence our restaurant choices, it’s far from the only thing that does. If we were to really match to peoples’ tastes, we wouldn’t want to just know where they live, we’d like a lot of other data as well. How spicy do they like their food? Do they prefer chains or independent places? Are they comfortable with ordering on an app? Do they care if the food is ethically sourced? What’s their budget for a meal?
This is where maths starts to creep in. I’m not going to hit you with a bunch of equations, but it’s more about how we think about the problem. Imagine we’ve collected all this data for a lot of people - say we’ve got twenty numbers for each of them. Perhaps the first two numbers are their geographical coordinates on my map, the third number is their age, and so on.
Instead of visualising people as a bunch of dots spread across our (two-dimensional) map, we could instead think about plotting these numbers on a graph using axes in twenty dimensions. Of course, that’s not really possible in the three-dimensional world we live in, but perhaps you can try to vaguely imagine what it might look like?
Just as people on our two-dimensional map tend to clump together in towns, what you might find is that the points aren’t spread out uniformly in our twenty-dimensional world, but rather form into patterns and clusters. There are good reasons for this. Our data points are not independent of one another: you might imagine that young people tend to have a smaller budget, are more experimental in their tastes, are more likely to be vegetarian, and so on.
If we’ve got the right data, then we could do a much better job of restaurant recommendations. If we know where the clusters are on our graph, and the taste of the people who form them, then we can suggest restaurants in exactly the same way as we did before, by looking for a near-neighbour and following their opinions. But whereas our previous idea of a neighbour was simply based on geographical proximity in two dimensions, to be close in our twenty-dimensional world you have to match in many more respects.
In fact, if these cluster effects are strong enough, then we don’t even need to know everything about someone to make a recommendation for them. That’s good, because it means that even if we don’t know someone’s salary (for example) then we can essentially infer it by putting them into a cluster with people who match the characteristics that we do know.
Indeed, these so-called recommender systems can work well even when it seems like they have a fraction of the data they might need. Back in the days when we called these techniques ‘machine learning’ (before people realised they could get bigger grants and more press coverage by calling them ‘artificial intelligence’) there was something called the Netflix Prize where:
Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies.
You could win a million dollars by using this data to predict what films other people might like based on their ratings: someone who liked Harry Potter might be more likely to like Twilight, and so on. But what’s interesting is how sparsely reported data there was, relatively speaking: if every user had rated every film, there’d have been about 8.5 billion data points. In fact there was about 1/85th of that, but regardless, the algorithms could do their job.
In fact, this illustrates another feature of these systems. I described how The Algorithm might work given twenty possible pieces of data, but from a mathematical point of view there’s no need to stop there. The equations are essentially the same if we have two thousand, or two million numbers per person: the only real difference is the amount of storage and computation required. But if you think about Facebook or Google hoovering up what links you click on (and how fast and at what time of the day), it’s not surprising that they can build up an incredibly detailed profile of you to target their adverts.
This comes with benefits and problems. While imagining things in two million dimensions is fine for a computer, as humans we may prefer to have visualisations of the data that we can understand. Just as we mapped a three dimensional globe into a two dimensional map, we can use techniques to reduce the dimension of the data, and to project it into spaces we can plot and visualise.
The fact that the data comes with redundancy and dependencies means that although it may live in two million dimensions, most points might lie close to a shape with very many fewer dimensions than that. We can use mathematical techniques to find that shape, and produce the equivalent of our original map by plotting people’s positions on that shape.
For example, Electoral Calculus do that here - they identify seven “tribes” (or clusters) of voters, and think about them in three dimensions: economic, global and social. But the economic score is not something that we measure directly. Rather, it is formed by taking weighted averages of the answers to certain questions.
This can be very helpful to visualise the tribes, but it’s not always easy to interpret directly. As a member of the public, I don’t know what my position would be on these axes, or exactly what features of the data contribute the most to the scores. This may not seem like a big deal, but it can have undesirable consequences.
For example, last week Substack suddenly decided I was interested in hearing from antivaxxers. The top three personalised recommendations for me in the Science category were all this type of account:
The problem is, it’s probably impossible to know why. I guess that The Algorithm noticed I seem to be interested in COVID, and thought I’d appreciate being matched with a bunch of people who are all interested in the same thing (albeit in a very different way!).
It can be a common feature of the way these types of systems work: if I click on someone’s account out of morbid curiosity, or quote tweet them to dunk on them, then The Algorithm can mistake that for a genuine desire to hear more from them and their friends forever. In general, these systems are blindly optimising mathematical functions without stopping to think about what those functions are and the desirability of lumping certain people together. But that’s the price we pay at the moment for the convenience that these systems offer us.
Perhaps the next generation of these systems will do better, but regardless it’s likely that they will continue to be built on these kinds of mathematical principles, which we should all have some familiarity with.
Oblate spheroid, whatever.
Recommender systems in general try to solve an "explore / exploit" problem; I sometimes wonder when I get a load of odd recommendations from one if it is actually trying to explore my preferences (since a non-click is informative) more than it thinks I'd like the content.
For some reason, this reminded me of a very sharp Ian Leslie essay from nearly a decade ago about the problems with targeted digital ads:
https://www.ft.com/content/cd1722ba-8333-11e5-8e80-1574112844fd