When did you know for sure that Joe Biden wouldn’t be on the ballot in 2024? I mean, there was the debate obviously. But that was on 27th June and, despite it being immediately clear he was finished, he didn’t actually step down until 21st July. So during that weird time when it seemed like the Democrats were going to try to Weekend at Bernie’s their way through the campaign, when was it certain that he was doomed? I knew with certainty on the 12th July, when House Democratic Leader Hakeem Jeffries sent the following letter:
It was brutal, and it was obvious that Biden was toast. But if you read it carefully, you’ll notice Jeffries never calls for the President to step down. There’s a coded reference to “working in the best interests of everyday Americans”. But the key thing is what the letter didn’t say. In the ordinary run of things, one leader of his party writing to another would have made reference to Biden being the best man for the job, giving a whole-hearted endorsement, looking forward to a successful campaign and so on. So the absence of that sentiment was as good as a call for resignation.
It’s very slightly subtle, but as I said at the time: “Don't need Alan Turing to crack this code”. But this is meant to be a maths-based Substack, so what can maths (and even Turing himself!) tell us about this kind of ‘communication without communicating’?
I’m not talking about non-verbal communication itself, the cues from body language and so on. Those are undoubtedly important - and I think the reduction in them through the uncanny valley of online meetings is what makes Zoom meetings so much more exhausting and less satisfying than meeting in person. But can maths say anything about the value of words themselves, and of their absence?
It turns out that it can. As I’ve mentioned before, my job title is Professor of Information Theory, and information theory is the topic that I research in the snatched moments when I’m not sending emails or going to meetings. Compared with a lot of areas of maths, it’s not particularly old - dating back just under 80 years to the foundational work of Claude Shannon in 1948.
Shannon essentially created the field in one go, establishing the fundamental limits to the performance of communication systems, for example when sending messages which are corrupted by random errors. If I want to send the letter A over the Internet, I could look it up as a series of 0s and 1s in an ASCII table, and find that the standard binary representation is 01000001. I could break that into two chunks of four symbols, add some parity bits using a Hamming code, and be sure that even if one or two bits get flipped in transmission then I should be able to deduce that an A was sent.
In that way, Shannon proved when we could recover a whole message, letter by letter with perfect fidelity. But looking back, he took what can seem like a narrow view of communication itself, writing (emphasis mine)
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem.
Shannon doesn’t care what the letters are. It doesn’t matter if we are trying to send Dan Brown’s latest meisterwerk or the instructions to build a nuclear bomb. He’s focusing on the probability that the message is received with (nearly) every letter intact at the other end. But real communication isn’t always like that. Think about Hakeem Jeffries’ letter, or about Picasso’s flamingo:
Of course, it’s not photorealistic. It lacks the detail of the feathers and the colour of a real bird. But it’s unmistakably a flamingo. If you had to communicate the idea of a flamingo to someone (or to an AI classifier) so they recognise it, this picture would do just as good a job as a much fancier one. Picasso has nailed the essential flamingosity.
This starts to raise some interesting questions. If we wanted the most economical way to represent a flamingo, is this it? Is it better than just writing the word “flamingo” on a piece of paper? Of course, the latter relies on the person receiving it being able to read English, which opens up the idea that a message may be successfully received by some listeners and not by others. This is no surprise to anyone who has ever given a lecture, or had people explaining your own joke to you on Twitter. Instead of thinking putting text into a single communication channel, maybe each person has their own? This area is often referred to as semantic information theory, and there’s a nice article here by Tsachy Weissman about related ideas.
But we can go even further. Instead of thinking about how much information is received when a message is read, we can even consider whether the act of sending a message itself carries information, whether or not we get to read it.
This idea lay at the heart of the process of traffic analysis carried out at Bletchley Park. Even when the Enigma or Lorenz ciphers couldn’t be broken, the messages themselves carried some information to the codebreakers. By studying the call signs used, the times they were sent and who they were sent to, Bletchley could build up a picture of which units were operating where and what the command structures looked like. In this way, by understanding normal and abnormal broadcasting behaviour, it was possible to learn things even from messages which couldn’t be read.
You can imagine a similar thing happening with email messages now. The timing and direction of messages reveals a lot about hierarchies within an organisation. If someone high up in the University sends me a message, I’m likely to respond much faster than to a unsolicited mail from a total stranger (sorry!). Managers are likely to cascade messages to their teams down the corporate pyramid, and so on.
The classic test case for this is the Enron dataset of emails revealed after the company’s collapse, where data scientists can reconstruct much of the corporate structure just by looking at the headers of emails. But you could imagine similar tools being used by intelligence and law enforcement agencies, and we needn’t restrict ourselves to emails.
If near-simultaneous bank transfers are made to six people, four of whom are suspected terrorists, then it might be reasonable to investigate the other two and to ask whether the payee is some kind of terror financier. There is a running joke that it’s possible to infer whether the USA is about to go to war by monitoring how many late-night orders are received by Washington DC pizza parlours.
In fact, it’s less that information can be communicated unintentionally, rather than that you have to try really hard to avoid doing so. (I’m co-author on a paper with our very talented former PhD student which studied the question of “just how hard do you have to try?”).
Even one of the most secretive processes of all, the recent Papal Conclave, leaked information long before white smoke rose up the Vatican chimney. Forget all the zero-information tech bros blagging it on Polymarket, the real heroes of the story were the network theorists. By studying who worked together, who had ordained whom and so on, Italian data scientists had constructed a model of the Cardinals’ influence network before the Sistine Chapel doors were shut.
They studied a range of network characteristics, including
Status, measured by ‘eigenvector centrality’, rewards cardinals who are connected not only to many, but to the most influential ones.
and found that the leader by that measure was a certain Cardinal Robert Prevost. They investigated two other measures too and found other leaders, so it wasn’t a prediction as such. But still, you’d have cleaned up on the betting markets if you’d followed this prediction, so it’s worth understanding what this eigenvector centrality represents.
The idea is that we don’t just count connections, but that connections to high-value Cardinals matter the most. It may seem a little bit circular - how do you know who is high-value, if the value depends on which high-value nodes you connect to? - but it turns out that it’s just a problem in matrix algebra to figure it all out.
And of course, it shouldn’t be a surprise that we can infer the most important people in a network using these kinds of methods, because in the last 30 years this has literally been a trillion dollar idea. Google’s PageRank algorithm is based on a variant of this eigenvector centrality method, with the same motivation. A page with lots of links coming in can be constructed by a spammer, but a page with lots of links coming in from reputable websites is a good indicator of value.
So, despite what Shannon said and as Biden discovered, sometimes the way that a message is sent can be as important as the message itself, and communication networks contain a lot of valuable structure if you know how to look for it.
Thank you a most interesting essay somewhat tangential to my own specialty: language translation, where the most important parts are thought to be the white spaces between the words and lines.
And thank for your delightful “flamingosity,” a truly wonderful neologism.
Interesting as always, thanks!
I am reminded of the EFF essay on metadata, arguing against telecoms, etc providers who say it is not privacy invading, yet some trivial analysis would should you may have got a call from a doctor, then call a abortion helpline, or received an email from the hospital, then visited a HIV advice website shortly afterwards.