Measuring the unmeasured with data

This essay was written for a chapter of Digital Investigative Journalism: Data-Driven Investigation, New Visual Analytics Tools, and Innovative Methodologies in International Reporting, to be published at Palgrave Macmillan under the coordination of Florian Stalph and Oliver Hahn of the University of Passau.

In 1887, Nellie Bly, an investigative journalist, decided to report on the living conditions in insane asylums in New-York City. She went undercover for ten days in an institution at Blackwell’s Island (now Roosevelt Island between Manhattan and Brooklyn), feigning insanity herself. There, she observed the personnel, talked to inmates and, upon getting out, wrote a book of her experience, Ten Days in a Mad-House.1 The book and the media campaign organized by Bly’s employer, the New York World, convinced the city of New-York to increase funding for asylum wards. In the words of the introduction to the book’s edition at the Disability History Museum, it “provided charity commissioners with the ammunition needed to convince city bureaucrats to provide more funding”.

One hundred and thirty years later, in 2016, another investigative powerhouse does a story on psych wards in the United States. Buzzfeed’s Rosalind Adams showed how a private corporation made a business of locking patients in who had just came for a free mental health assessment, just to bill their insurers.2

On top of numerous interviews, Adams used a series of data sets to show that the situation she was describing did not concern just one institution or one region, but the whole of the country, and that the failings were caused by the business practices of a company, in this case UHS. By contrast, 19th-century Bly never once used any statistic to make her case. More strikingly, in the heated exchanges that followed the publication of Bly’s investigation, her adversaries did not bring in data either, preferring to focus on details of her stay at the asylum.3

Why did two investigative journalists act so differently? Why do today’s journalists need data to make a case, when a human story was enough a century ago? The 19th century did not lack data. The American bureaucracy was already vast in 1887 and Nellie Bly could relatively easily have accessed figures such as the total number of inmates, institutions and doctors who were part of the problem she was exposing. Some investigative journalists of the same period actually used data, when no other investigative technique was available. This is how E.D. Morel researched the slavery system set up by the Belgians in the Congo, for instance, in his book Red Rubber.4

There are many reasons for this change, but the most important one comes from the new relationship the social elite found to numbers. Starting in the second half of the 20th century, governments found a new faith in measures. The dreams of a planned society, whether in Soviet countries or in the West, permeated governments and corporations. While government planning has come out of fashion, the search for guidance through numbers remains, bolstered by the advent and promises of the computer and, since the early 2010s, of an artificial intelligence.

This essay explores how journalists and newsrooms use data to measure the unmeasured, thereby remaining relevant in an era of “governance through numbers”.5

Turning a story into an issue

For an investigative project to have as much impact as Nellie Bly’s investigation on mad houses, the journalist needs to provide a measure of the issue at play and be able to follow up on this measure through time. Absent such measurements, officials from the bodies responsible for the regulation of the issue might simply argue that the story being described is an anecdote that does not represent the overall situation or that the report is a fabrication. On top of this, they might provide an avalanche of figures showing how good a job they are doing. Simply providing figures - any figures - enhances the credibility of a report.6 Had the city of New York known of these communication techniques at the time, it would have easily fought back against Bly’s story.

In this environment, some journalists have adapted and became aggregators and curators of databases as well as providers of data-based reports. The process they follow involve the definition of the issue to be measured through the creation of a methodology, before collecting and structuring data. Such efforts can last several years and have profound impacts.

Together with a consortium of journalists in Europe, we created a database of events during which people had died in their attempt to reach or stay in Europe. The project, called The Migrants’ Files, ran from 2013 to 2016. Each incident was sourced, located on a map and linked to other information, such as the number of people who died, the number of people who went missing and the reason for their death. The Migrants’ Files was not the first project to list fatalities on the borders of Fortress Europe, it built upon previous work by a non-governmental organization, United Against Racism, that had collected such information since 1993 in an unstructured format, unusable for statistical purposes. By structuring the information on the issue, The Migrants’ Files could provide monthly updates on the number of men and women who died, map the locations of the incidents and perform other analyses, such as computing the mortality rate by route (crossing into Europe from Lybia is about a thousand times more dangerous than crossing from Turkey to Greece).

At the date of publication, this approach to the issue of migration was relatively new. Until then, policymakers and journalists had preferred a human interest approach, where the unique tragedy of each mass death event was treated individually. (The United Nations High Commissioner for Refugees started collecting structured data on the issue in 2006 but did not release it in any other format than a yearly press release, which might explain why it did not gain traction.7) The data provided by The Migrants’ Files then served as the foundation for the International Organization for Migration’s (IOM) report on the issue in late 2014, which was followed by a systematic, real-time and sustained effort to provide data on the issue. Once it became clear that IOM would continue its data gathering operation, The Migrants’ Files was discontinued. The typical headline moved from “How tragic this shipwreck is” (event-based discourse) to “Death toll in the Mediterranean increases over last year” (the issue ‘deaths in the Mediterranean’ can be measured and reported upon). Without measurement, no one can assess the effect of specific policies on an issue, and therefore no one can hold the powerful accountable for their deeds regarding the issue.

In Spain, the Civio Foundation, a non-profit group of journalists, activists and developers, started the Indultómetro in 2012. They collected data from the official gazette on all governmental pardons. While the data was already publicly available and individual cases commented upon, the fact of structuring the data and providing measurements of the issue showed the magnitude of the pardons, as well as how pardons were exchanged for political favors.8 The long-lasting nature of the project ensured that it gained credibility among other journalists, ensuring coverage, all the while amassing more data that made the possible analyses more comprehensive.

In Germany in 2015, the newsroom of Zeit gathered data on arsons of the homes hosting men and women fleeing war. They showed not only the magnitude of the issue, which so far had been reported as a series of isolated incidents ; they also showed that the crimes were almost never investigated by the police.9 While Zeit did not continue updating its data set, another newsroom, taz, took on the effort and provided new data for 2016, following a similar methodology.10

While measurements by journalists can create new newsworthy issues, they can also change how existing ones are treated by officials. In the United States, both The Guardian (The Counted, 2015-16) and The Washington Post (Fatal Force, 2015-ongoing) created and maintained databases of police casualties. The projects, which recorded about twice the official number of deaths, led to a change in how the Department of Justice monitored the issue.11 The Iraq Body Count, which compiles data on violent deaths of civilians in Iraq since 2003, mentions the need to shift to a “human-centered” vision (as opposed to “military-centered” in this case) of the conflict in its rationale, making explicit how data can change how an issue is perceived.12

Born out of necessity

Some of the measurement projects by journalists and citizens are conscious attempts to provide new data on a given issue. Safecast, for instance, a project started in 2011 to measure the radiation levels around Fukushima, Japan, was started to provide data free from bias or from “perceived bias”.13 While the project was not launched by a newsrooms, its aims of providing the wider public with accurate information to help them reach better decisions aligns perfectly with the goals of journalism, according to more than one definition of the craft.

Not all measurement projects arise from a conscious desire to provide data. Homicide Watch, for instance, tracked all homicide in Washington, DC, between 2009 and 2014, and stored the information in a structured format. While the data was ultimately used to provide statistics and shed a new light on the issue, it was launched as a place for people to share information on homicides. The journalist who started the project, Laura Amico, saw that her community had a need for such a place and created it.14

Some projects aiming at providing data on an issue fail to find an use. We ran the project Rentswatch between 2015 and 2017, which scraped several thousand classified advertisements for apartments to rent in Europe and provided an average rent price for a given location, precise at the street level. Our hope was that the data could be used to identify the most rapidly gentrifying neighborhoods and analyze the effects of rent control policies in different countries. The issue we wanted to bring to light was the notion of a European housing market, as opposed to 28 national ones, based upon the hunch that many renters live across borders, either as immigrants or cross-border commuters. Despite creating what was probably the largest database of rent prices accessible to journalists, complete with an API (an application programming interface, which helps users access the data easily), the project saw few reusers and no use in investigations (the data was used for entertainment or service purposes). The relative failure of the project shows that a data collection effort on an issue which does not exist as such fails to have any impact if it does not convince fellow journalists of its necessity.

A counterweight to official data

Public institutions tasked with measurements always implement and contribute to a certain vision of the world. Sweden, for instance, is famous for having the longest-running, systematic census in the world, which started in 1749. In the 18th century, the census collected only the age and social status of the population, because the government was interested in how fast the poorer ranks of society grew. The population was divided in productive and non-productive elements, which included beggars and Sami people. Sami people were then, in 1850, moved from “unproductive” to “non-Swede”, together with Roma and Jews. Only in 1993 were the Sami allowed to have a say in how they should be measured. By leaving measurements to the authorities alone, a society implicitly accepts to be ruled by their government’s prejudice. Measuring independently is fully part of the idea of keeping power accountable.15

The Counted and Fatal Force, the databases on people who died of police violence in the United States, and Safecast clearly stated their intention to provide another measurement of an issue already measured by officials. In Argentina, newspaper La Nacíon regularly tries to provide alternative measurements of inflation, given that the government’s data is untrustworthy. They first aggregated several sources in 2012,16 then moved on to measuring the number of 500-peso bills in circulation to estimate the increase in the monetary base and derive inflation from there.17 (Neither attempt was totally successful.)

Investigations can also provide a counterweight to official narratives simply by turning a story into a measured issue, as explained above. Drone Wars, a project by The Bureau of Investigative Journalism, a British non-profit, was started in 2015 and provides as much structured data as possible on drone strikes carried out by the American and British governments. Analyzing the data they gather, the Bureau can offer insights in the wars carried out by these regimes, which sometimes markedly differs from the official discourse. They could show, for instance, that while operations against the Talibans ended officially in 2014, drone strikes against the group continued apace since then.18

It happens that journalists simply request access to the official data itself to show that the authorities did not interpret it correctly. An investigation by the German non-profit Correctiv, for instance, showed that the police of Vienna, Austria, vastly over-reported thefts and drug-related crime in their press releases, while ignoring rape and hate crimes.19 Reuters Investigate showed that the United States health authorities did not adequately measure the number of death induced by drug-resistant bacteria.20


Measurement projects rarely require any kind of hardware (exceptions to this are the Cicada Tracker, which monitored temperature with custom thermometers, and Safecast, which relied on Geiger counters). They usually aggregate information that is available in news report or in social media and structure it in a database, following a precise methodology. The cost of such projects lie only in the time spent gathering and structuring information, as well as, when needed, the time of setting up a front-facing, interactive interface.

A quick calculation shows that a data collection project done by a journalist costing 75,000 a year, needing 30 minutes per day every day ends up costing less than 5,000 a year. Add another 5,000 in yearly costs for a front-end interface and the total yearly cost (notwithstanding the writing of news articles with the data) reaches 10,000. This estimate fits the order of magnitude of the costs of The Migrants’ Files (15,000€ over two years) and Rentswatch (30,000€ over two years).

Opening data and making it available lets others reuse it and expand the original project’s reach. The Drone Wars dataset, for instance, was used by agency Pitch Interactive in 2013 to create an interactive application, Out of Sight, Out of Mind, which was widely shared. Data from The Migrants’ Files was reused by developer Moriz Büsing to create a visualization, 15 years, which was much more powerful than the original one. Considering that many of the newsrooms that create measurement projects are non-profits and that their funders care about “impact”, such reuse of data is key to their performance indicators.

ProPublica, a non-profit newsroom which won several Pulitzer prizes, set up an online shop where it sells the data sets derived from their measurement operations. Over the course of two years, they grossed $200,000, or 1% of their operating income.21 Despite this relative success, no other newsrooms followed suit and sold data they collected.

Complementing academia

The economics of journalistic measurement are fairly insignificant (both the costs and potential revenues are small). More important is its contribution to the production of factual knowledge. Some projects explicitly aim at providing academics with data for them to use. The Hindustan Times, for instance, launched a Hate Tracker that monitors instances of hate crimes in India. One of their stated goals is to “provide solid, irrefutable evidence to researchers”.22

Even if journalists do not see themselves as competitors to academia, or even as followers of the scientific method, the fact remain that by defining a methodology, making it public and collecting data, journalists engage in science insofar as they create knowledge in a way that can be replicated and falsified.

As academia has proved less able to fulfill its mission of producer of factual knowledge,23 journalists came to fill the void. Some of the products published by journalists are tailored to look like academic articles, making clear that the level of seriousness the newsroom aims at is equal or greater to the one coming from universities. ProPublica, for instance, published a white paper on the data they collected on the risks of elective surgery which mimics in all ways the style of an academic article.24

The findings of journalistic measurement projects can also balance those of academics. ProPublica (again) proved in 2016, using data it curated, that doctors who received payments from a pharmaceutical firm were more likely to prescribe drugs from that firm than those who did not.25 Two academic studies came to the same conclusions, but they were published after the ProPublica story.26

Journalists regularly take upon themselves to measure issues systematically, thereby creating exclusive content, reporting events under totally new ways (by making them issues instead of individual stories) and providing a counterweight to official discourse. This not only proves relatively cost-effective (compared to other forms of journalism such as special correspondents), it makes major contributions to the intellectual pursuit of truth, complementing or even surpassing academia in this endeavor. From this laudatory assessment, and taking into account my own biases, one must wonder why newsrooms do not produce more measurements. The answer flows logically from the previous sentence: very few of them are missioned with finding the truth. It is only logical that non-profit operations such as ProPublica, Civio or The Bureau of Investigative Journalism are at the forefront of journalistic measurements: their mission statements make clear that their job is to find and publish the truth. Other newsrooms have a variety of other missions, from gathering attention in order to sell it to advertisers to providing an oligarch with political influence. For them, it makes no business sense to engage in long-term data collection projects.


In case you want to read my next essay in your e-mail inbox, type you email below and you'll be all set.


1. Bly, Nellie (1887). Ten Days in a Mad-house: Or, Nellie Bly’s Experience on Blackwell’s Island: Feigning Insanity in Order to Reveal Asylum Horrors. New York: N.L. Munro.

2. Adams, Rosalind (2016). Locked On The Psych Ward.’ BuzzFeed.

3. Bly, Nellie (1887). Untruth in Every Line, Nellie Bly contradicts a recent article in the Sun. New York World.

4. Morel, E. D., and Harry H. Johnston (1907). Red Rubber: the story of the rubber slave trade which flourished on the Congo in the Year of Grace 1907. London: Fisher Unwin.

5. Supiot, Alain (2015). La Gouvernance par les nombres. Paris: Fayard.

6. Tal, Aner, and Brian Wansink (2014). ‘Blinded with Science: Trivial Graphs and Formulas Increase Ad Persuasiveness and Belief in Product Efficacy.’ Public Understanding of Science 25.1: 117-25.

7. United Nations High Commissioner for Refugees (2012). Mediterranean Takes Record as Most Deadly Stretch of Water for Refugees and Migrants in 2011.’ UNHCR.

8. Bengoa, Aitor (2015). La Concesión De Indultos Cae Un 84% Pese a Que Crecen Las Peticiones.’ El País.

9. Blickle, Paul; Biermann, Kai; Faigle, Philip; Geisler, Astrid; Hamann, Götz; Jacobsen, Lenz; Kemper, Anna; Klingst, Martin; Polke-Majewski, Karsten; Schirmer, Stefan; Soltau, Hannes; Stahnke, Julian; Staud, Toralf; Steffen, Tilman and Venohr, Sascha (2015). Es brennt in Deutschland. ZEIT Online.

10. Stöckel, Christina; Sona, Zoe and Bednarczyk, Svenja (2017). Es brennt in Deutschland. taz

11. Swaine, Jon, and Ciara McCarthy (2016). Killings by US Police Logged at Twice the Previous Rate under New Federal Program.’ The Guardian.

12. Iraq Body Count. Knowledge of war’s casualties promotes a human-centred approach to conflict..

13. Ewald, David (2011). ‘Open Dialogue. Safecast.

14. Amico, Laura (2014). Website that kept watch on D.C. homicides shuts down.

15. Kayser-Bril, Nicolas (2016), ‘Free your data’ is over. Now, we need data to be free..

16. Marshall, Sarah (2012). How Argentina’s La Nacion is opening data without FOI.

17. La Nacíon (2017). Tracking the monetary base in pesos 2003-2017, and how this speaks about inflation.

18. Purkiss, Jessica (2017). US military operations against Taliban dramatically escalate. The Bureau of Investigative Journalism.

19. Kanya, Evelyn and Siebenhofer, Alexandra (2015). Gefühlte Kriminalität. Correctiv.

20. McNeill, Ryan, Nelson, Deborah J. and Abutaleb, Yasmeen (2016). The Uncounted: The deadly epidemic America is ignoring. Reuters.

21. Bilton, Ricardo (2016). ProPublica’s Data Store, which has pulled in $200K, is now selling datasets for other news orgs. Nieman Lab.

22. Hindustan Times (2017). About the Hate Tracker, Hindustan Times.

23. Kayser-Bril, Nicolas (2017). The collapse of academia.

24. Pierce, Olga and Allen, Marshall (2015). Assessing surgeon-level risk of patient harm during elective surgery for public reporting. ProPublica

25. Ornstein, Charles; Grochowski Jones, Ryann and Tigas, Mike (2016). Now There’s Proof: Docs Who Get Company Cash Tend to Prescribe More Brand-Name Meds. ProPublica.

26. Ornstein, Charles (2016). Feed Me, Pharma: More Evidence That Industry Meals Are Linked to Costlier Prescribing. ProPublica.


Want to raise an issue about this post? Please open a new one on Github or make a pull request directly, but make sure to read the rationale behind this blog first.