Journalism should be reproducible

The idea came a month ago: “journalism should be reproducible.” After a conversation with Miles this weekend, I’d like to explore this further.

First point: Let’s approach journalism as the science for civic participation. Give journalism the goal to help us improve our standards of living, create a more just society, and so on. Make the goals measurable in various ways, and we can track our progress towards them.

Science, according to Wikipedia, “builds and organizes knowledge in the form of testable explanations and predictions about the world.” A report in a scientific journal has an abstract, methodology, presentation of the data, discussion and conclusion. News articles typically have the first and last. They’re missing two critical pieces: presentation of the data and the methodology used to collect the data. Reproducibility is a vital aspect of the scientific method (related: Jonah Lehrer has a fascinating article on this topic in the New Yorker).

Journalism has no equivalent. As a profession in existential doldrums, we say we bring truth to power while, at the same time, questioning ethics when Anderson Cooper calls out the lies of the state. This is broken. Jay Rosen likes to say “here’s where I’m coming from.” It wants to include “here’s my conclusion, and here’s the data and methodology to back it up.”

For this month’s Carnival of Journalism, David Cohn asks: considering your unique position, what can be done to increase the number of news sources? Last month, I did a time-consuming content analysis of the two Locals. I think the question afforded it. This month’s question makes a false assumption: improving journalism requires a growth of news sources. Look at Google News right now:

We have a metric ton of news sources producing a metric ton of content. 2,535 stories about the Bahrain story, 1,002 stories about Iran, and 392 stories about Ben Ali’s grave condition. I’d be willing to bet there are a countable number of facts contained within all of those stories, and the actual collection of those facts took less than 5% of the total resources used.

Everyone can publish, but the question remains whether what they’re publishing is useful. Instead of increasing the number of news sources, we should focus on producing durable data and the equivalent tools for remixing it. Data can be the average wait time at the restaurant I’m eating at this evening, or the cost and freshness of produce at the various bodegas in my neighborhood. Or, if I’m searching for office space, whether the building I’m looking at actually recycles like they say they do. Functional databases make up two-thirds of the Texas Tribune’s traffic. When information is managed at the data level, it can more be easily reused in different contexts with different tools (long form Instapaper article vs. app with location-based push notifications). Expose the original data and methodology used to achieve the conclusion, and journalism can be reproducible.

Leave a Reply