Bad BBC stats
Like Ma Bell, i've got the ill communication
A couple of months ago, the BBC produced some great internal guidance on how to report statistics and data. One key part was on the difference between correlation and causation:
Content makers must be careful not to assume that if two sets of data go up or down at the same time then one is necessarily causing the other. This is the difference between correlation and causation. A correlation is when two sets of data tend to move together, but causation should not be implied without a high threshold of proof. In looking for explanations of correlated data it should be considered if causation was actually examined by the study and care taken to avoid relying on preconceptions, including one's own.
As I said at the time, that’s fantastic! It’s a clearly written explanation which I’d be happy to have in my own lecture notes - and honestly the editors of some scientific journals would benefit from reading it. However, in my view the lead story on the BBC website today is skirting dangerously close to confusing these issues, in a not particularly helpful way.
To me, the headline of the story “First week 'critical' to avoid children missing school later, parents told” clearly implies causation. The language “to avoid” implies that if we can somehow get the first week of school right, then the rate of absence will be lower.
However, this is backed up with data about a correlation:
Pupils in England who missed school during the first week back in September 2024 were more likely to be absent for large parts later in the year, figures suggest.
More than half (57%) of pupils who were partially absent in week one became "persistently absent" - missing at least 10% of school, according to government data first seen by the BBC.
By contrast, of pupils who fully attended the first week, 14% became persistently absent.
Essentially they are reporting different properties in two different groups. And if you think about the data a bit more, it’s clear there’s a problem with this reporting. Some children are persistently absent because they have a long-term health problem, and many of those will be absent in Week 1.
Part of the issue is that we aren’t told directly how big the two groups are. Except, later in the article we are told that 18% of the school population were in this overall persistently absent class, so we can work it out. The big clue is that 18% and 14% aren’t actually very different.
But we can do the sum properly: if a fraction p of the population are in the first group, we know that (see Numbercrunch for details!)
Rearranging and solving, we can see that p = 0.093 solves this equation. We don’t know it that precisely, so for simplicity we’ll say that 9% of pupils were absent in the first week of school. That’s not a huge group.
As I say, the issue is that some people are away from school for good reasons. We don’t know what proportion of children have serious health problems, and we’re honestly not going to agree on that number, because it’s going to come down to definitions and interpretation.
But a quick google gives me a proportion around 15%. That doesn’t sound like a crazy number, if you start thinking about a range of mental and physical health conditions which might require regular checkups or recuperation. So, suppose we take that as the figure.
If that’s the case, a good proportion of that 15% will be absent in Week 1. Say a third of them miss a day, then that’s 5% of the school population. That’s a solid fraction of the 9% - just over half. We could imagine that our first week absentees are made up of 5% chronically ill people (say 80% of whom might go on to be persistently absent, because the worst-affected people are most likely to be in that group), and 4% of people who just got unlucky, had a cold or were late back from holiday, and will otherwise act like the rest of the 91%, then the persistent absence rate in that group would be something like
pretty close to the 57%!
Obviously these calculations are somewhat back of the envelope (to say the least). The 15% and a third are kind of pulled out of thin air. But I think it’s a useful illustration of why that BBC guidance was there in the first place, and why we need to be careful with the difference between correlation and causation.
Certainly, it doesn’t seem to be helpful to be convincing parents that first week attendance is a magic bullet, or guilt-tripping ill children to come in when they shouldn’t - and I think the BBC (and the Government whose data the story relies on) should know that.
Enjoy the first week of term, however you are feeling!


Without looking at any numbers, is this not easier to explain as "those persistently absent are also more likely to not turn up in the first week of term"?
Surely even a minister should be able to grasp that?
If you want some much more detailed and well grounded analysis than what the BBC can muster on pupil absence you should look at FFT Education Datalab e.g. https://ffteducationdatalab.org.uk/2025/04/pupil-absence-in-autumn-and-spring-2024-25/