“Number soup”: Can we make it easier for readers to digest all the numbers journalists stuff into their stories?
A widely watched measure of news organizations’ median obscurant adumbration hit a year-to-date high of 11.2 in July, a new study has found, a 42% increase from June’s 7.9 measure and the biggest single-month jump since a 7.1 percentage-point change in April 2019, but still well below the anticipated range of between one in six and 32.1.
Okay — I made all that up. No one measures news organizations’ median (or average!) “obscurant adumbration,” because that’s not a thing.
But did that paragraph remind you of any ledes you’ve read (or written)? A long sentence clotted with clauses, thick with too many numbers and percentages and comparisons? One that intermingles a “percent increase” with a “percentage-point change”? Or the hard-number precision of “32.1” with the unstable do-math-in-your-head of “one in six”? Even if you somehow know what “obscurant adumbration” is, is a “year-to-date high of 11.2″…good, bad, catastrophic? And it’s 11.2 whats, exactly? Points, units, megatons, gigabytes?
That may be an extreme example, but we’ve all seen sentences like this, where even the most attentive reader is unlikely to reach its concluding period better informed than when she began it. A new paper out in the journal Journalism Practice terms the overriding faith in numbers journalists sometimes fall prey to “numerism”:
The appeal of numbers is especially compelling to bureaucratic officials who lack the mandate of a popular election, or divine right. Arbitrariness and bias are the most usual grounds upon which such officials are criticized. A decision made by the numbers (or by explicit rules of some other sort) has at least the appearance of being fair and impersonal. Scientific objectivity thus provides an answer to a moral demand for impartiality and fairness. Quantification is a way of making decisions without seeming to decide. Objectivity lends authority to officials who have very little of their own.
“Numerism” is a pretty good name, but I prefer the term the paper’s authors picked for its title: “Number Soup: Case Studies of Quantitatively Dense News.”
Those authors — Jena Barchas-Lichtenstein, John Voiklis, Bennett Attaway, Laura Santhanam, Patti Parson, Uduak Grace Thomas, Isabella Isaacs-Thomas, Shivani Ishwar, and John Fraser — are an interesting mix. Six work at Knology, a “collective of scientists, writers, and educators dedicated to studying and untangling complex social issues,” which works with news organizations to better understand the effects journalism can have on its audiences. The other three work at PBS Newshour, one of those partner news organizations and the flagship newscast of American public television.
They wanted to figure out what characterizes the number-dense stories that reach readers and whether there are better ways to present them. As they put it in their abstract:
The dense clauses were often grammatically complex and assumed familiarity with sophisticated concepts. They were rarely associated with explanations of data collection methods. Meanwhile, the dense news reports were all about economy or health topics, chiefly brief updates on an ongoing event (e.g., stock market fluctuations; COVID-19 cases).
We suggest that journalists can support public understanding by:
Providing more detail about research methods;
Writing shorter, clearer sentences;
Providing context behind statistics;
Being transparent about uncertainty; and
Indicating where consensus lies.
We also encourage news organizations to consider structural changes like rethinking their relationship with newswires and working closely with statisticians.
The researchers gathered a corpus of 230 U.S. news stories covering four subject areas — economics, health, science, and politics — in late February 2020, meaning they caught the early days of COVID as well as that year’s presidential primaries. Most were text stories, though they did use the transcripts of some news videos. The economics, health, and science stories averaged around 650 words and 30 distinct clauses; politics stories were a bit longer (860 words, 45 clauses).
They then used the wonderfully named tool Dedoose to analyze both each story and all of their constituent clauses for their quantitative density and the sorts of literacy that would be required to comprehend them. Dedoose coded the stories and clauses for these qualities; the more codes assigned to a given clause, the more quantitatively dense it is.
(Some examples of the codes used: Proportion or Percentage; Variability, Concentration, and Variation; Risk and Probability; Magnitude and Scale; and Sampling, Representativeness, and Generalizability. Much more detail in the paper.)
So, what did they find?
Quantitative density is unevenly distributed. Stories that weren’t overstuffed with numbers often had individual clauses or paragraphs that were. Nearly half of the densest stories were about the economy, followed closely by health stories; politics and science stories had many fewer.
Ledes can be the site of the greatest number density. The inverted-pyramid ideal led some journalists to overstuff their ledes with data, especially on wire stories. Here’s one example from the AP:
An online reading level calculator put that lede at the “college graduate” level of difficulty.
A lot of numbers can correlate to a lot of grammatical complexity. Clauses tacked on clauses tacked on clauses. An example:
Economics coverage is at particular risk of too much density. As I said above, economics stories were by some margin the ones most likely to be overstuffed in a potentially confusing way. You might well be able to understand this on a quick first reading:
But will most readers? “This clause asks readers to understand the percentage change over time (comparison) in median (central tendency) wages, as well as the variability in change over time at different parts of the wage spectrum,” the authors write. “Consider how much easier it is to understand the following version: Workers earning the highest salaries saw a 4.5% increase, while the typical worker saw an increase of just 1%.”
Coverage of the economy is particularly important, given that public perceptions of its strength or weakness is perhaps the most clearly established influence on voting behavior — and that perception is highly influenced by both news coverage and ideology.
The authors are not advocating that complex, number-dense stories be dumbed down; they’re advocating that they be written in clearer and more accessible ways.
Across story placement, they share several traits. They are often grammatically complex, with multiple clauses. Even before accounting for content, this complexity means they are relatively difficult to understand.
Many of them assume familiarity with sophisticated quantitative measures like economic indicators and epidemiological concepts. Audiences who lack this prerequisite knowledge may find this type of writing inaccessible, particularly because these sentences and the stories that contain them rarely take the time to fully explain research methods. In particular, references to official statistics typically left data collection methods unquestioned and unexplained. Without an understanding of the underlying methods (e.g., how BLS calculates unemployment), news users may not be prepared to make meaning of changes and trends, particularly at times of social disruption.
In combination, all of these traits seem to suggest these journalists are speaking to a more sophisticated target audience and may be leaving typical news users behind.
The authors lay out suggestions for improvement, as described in the abstract above. A few of my favorites:
You can find the full paper here, and a thread by author Jena Barchas-Lichtenstein here.