Datajournalism


Ever since the term entered the European media space in 2009, students or fellow journalists regularly ask me to define datajournalism. Let me try to do that here.

Datajournalism [1] means doing journalism with data. Data, in turn, is simply a piece of information from which to derive facts (in latin, datum is what is given, in opposition to factum, the knowledge one obtains from data) [2]. A fact cannot be wrong or it ceases to be a fact. A piece of data, on the other hand, can be true or false, it remains data.

Structured data

It could be argued that any kind of journalism uses data. “Doing journalism with data” does not define datajournalism. The specificity of datajournalism is to do journalism with structured data. The structure of the data, in this sense, means the ability for a computer to process it in chunks. To overly simplify things, think of structured data as an Excel table whereas unstructured data is a piece of text.

Let me take an example. Imagine that you cover car accidents in your town. You can either store your archive as a collection of texts. Or you can store your content in a table, with a column for the date of each accident, a column for the latitude, the longitude, the number of people injured etc. Using a table, it is easy to make calculations such as the total number of injured people per year. You will be able to spot trends or to make a map and see if some areas are more dangerous than others. This point was made very well by Adrian Holovaty in his 2006 article A Fundamental Way Newspaper Sites Need To Change.

Structured data allows for much more than text. It allows for automated natural language processing, currently known as “robot journalism”. It allows for interactive maps, or charts, or any kind of content. Structured data can be displayed on any computer-run device in any form. The possibilities are endless.

Publishing on a computer-run device implies knowledge of computer programming. Working with numerical data implies knowledge of statistics. Doing so on a competitive marketplace implies taking great care of user experience and requires skills in graphic design. No person on earth has all these skills. Doing datajournalism necessarily means working in a team, which must be coordinated by a project manager.

Is data journalism?

After I explain this, people usually ask “Wait, is this journalism?” Indeed, project management is not a skill usually associated with journalism. To answer this question, we must find out what journalism is.

The quickest way to find an answer is to look at organizations that represent journalists. Their definitions fall into two groups. Some define journalism by the tasks journalists carry out: “research, verify, contextualize, hierarchize, comment and publish quality information”, says the French SNJ [3]. The German DJV has a similar definition [4]. “It cannot be mixed with communication”, SNJ adds, but fails to define communication. As such, datajournalism qualifies for journalism, as does a financial report by any corporation (provided the information it publishes is of a certain “quality”, i.e not purposefully distorted) or your local police station’s Twitter feed.

Others define journalism from the point of view of the desired outcome. The Munich Charter, a declaration by several European journalists unions in 1971, considers journalism to be what lets “the public know facts and opinions”. Datajournalism qualifies under this definition, too. But this definition is even more open than the previous one, as just anyone publishing facts and opinions would qualify, including a Nazi rant on Facebook.

The most useful definition of journalism was given by the European Court of Human Rights (ECHR), Europe’s highest court on fundamental rights. For the court, journalism is the act of “disclosing to the public information, opinions or ideas”.

Summing up these three definitions, it appears that pretty much anything published anywhere is journalism, provided it is not a lie and some sort of editing is done. The data feed from an API does not qualify as journalism, but an interface that shows the information does. In practice, however, the institutions offering these definitions do not abide by them – by a wide margin. In 2009, for instance, the International Federation of Journalists (IFJ) published an appeal arguing that the fundamental difference between bloggers and journalists was the latter’s “responsibility” [5]. That this responsibility was not defined and that IFJ now defends bloggers is not the point. The point is that their definition of journalism was so poor that it could be twisted to exclude (or include) anyone.

The ECHR fares no better. Even though its definition of journalism encompasses anyone publishing information, its jurisprudence consistently favors journalists working in a newsroom over the rest of the population. More recently, as I have already argued, it took the view that datajournalism was merely data-processing [6]. Definitions that are not respected by the very people who write them are of little use.

Journalism as a process and a status

In its most commonly accepted usage, journalism is what is done by journalists. Journalists, in turn, are defined by their employer. They are men and women who work in or for a newsroom. Is datajournalism the work of newsroom journalists working with structured data? Many people would like to think so.

This definition turns work done by The Guardian and Cuba’s Granma [7] in journalism by virtue of its production process. It makes little sense. More importantly, it requires a definition of a newsroom and of a media outlet. How many people are required before a newsroom appears? One? Five? One hundred? How much content needs to be produced daily? At a time when any person or group connected to the internet can create a media organization, the concept of a newsroom as the only place where journalism is produced is obsolete.

Another way to define journalism comes from peer-recognition. Journalism is the production of people whom other journalists consider their peers. The appeal of this solution lies in its simplicity, as it eschew the need for a definition by the tasks carried out in favor of a definition by status.

In Europe, most journalists are defined by a mix of the first and second definitions. Committees of professional journalists assess the work of prospective journalists based on a set of criteria (the prospective journalist must have written for a big-name outlet, for instance) and arbitrarily [8] decide whether or not to grant them the coveted press card.

As long as the demand for professional journalists is expanding, the system of peer-vetting might work well. But basic economic reasoning yields that if the demand for journalists decreases, current journalists have an incentive to reduce the supply and will refuse entry to newcomers. In times of diminishing demand for journalists, in which we find ourselves since the mid-2000’s, ownership of a press card says little of one’s work. Instead, it shows one’s ability to join a protective group. It is pointless to think of datajournalism as the work the owner of a press card does with structured data.

Journalism as a service

The most interesting attempts to redefine journalism in times of shrinking demand have come from Jay Rosen and Jeff Jarvis. For them, journalism is the act of informing in the public interest. It is not defined by intent, processes or status but by the service it renders its audience. In this sense, work from structured data that is done in the public interest qualifies as datajournalism, whether it is done by a blogger or a newsroom journalist.

In this understanding, data-driven interactives done outside of news organizations qualify as journalism. Pieces done or bankrolled by NGOs, corporations or local administrations can also qualify. Of course, such publishers do have an agenda. But the pursuit of this agenda can push them to publish information in the public interest. News organization have agendas, too, in the form of maximizing revenues from advertising, pleasing a patron or rewarding a paying audience. All publishers face conflicts of interest. Whether a conflict of interest is bigger when it involves the owner of a news organization or the mission statement of an NGO is probably impossible to tell [9].

Defining journalism as the act of informing in the public interest has just one caveat. What is the public interest? The Financial Times and Russia Today probably have very different definitions. More importantly for the European perspective, public service broadcasters exist precisely to “serve the public interest”, as the BBC Charter puts it. Does every single piece of content produced by the BBC qualify as journalism? Probably not [10]. We could single-handedly discard the mission statements of European public service broadcasters as irrelevant and argue that their conception of public interest is wrong or badly enforced. However, the BBC and others are rigorously audited. Doing a better job that all public service broadcasters auditors would probably be beyond the means of any organization.

“Journalism” is obsolete

In this brief attempt at defining datajournalism, I showed that the traditional definition of journalism (content produced by a newsroom) is obsolete. Other, even newer definitions fail to provide a clear distinction between journalistic and non-journalistic content. These definitions fail because the concept of journalism was never intended to be used without the concept of the journal [11]. As journals and their newsrooms cease to be useful concepts to understand the way information is produced and consumed, journalism, too, ceases to convey much meaning [12]. The only valid definition of journalism is tautological: any action of publishing information called journalism is journalism. Therefore, datajournalism must be journalism.

To reach such a candid conclusion does have serious implications. Any legal distinction between journalists and non-journalists must be removed, as it harms people not recognized by judges as journalists for no reason at all. In 2013, the trial of Chelsea Manning was a clear example of this risk, which Yochai Benkler highlighted when he tried to convince the judge that Wikileaks was a news organization and that Manning should be treated as their source [13]. (Whether his arguments impacted the judge’s decision is not known).

Because the concept of journalism does not allow for a distinction between types of content, the only concept that applies to the current environment is that of information. Journalism is one way of producing information but there are many others, from public announcements to advocacy. The institutional arrangements supporting information flows in a democratic society must be rethought, from journalism schools, as I argued before, to unions and legislation. It is very unfortunate that no law-making body in Europe has understood the issue, let alone started work to solve it.

Notes

1. Datajournalism is used indifferently with data-driven journalism and database journalism here. Several Wikipedia entries exist for each topic and I’m partly responsible for this mess. Sorry.

2. On the etymology of data, I highly recommend the first chapter of Raw Data Is An Oxymoron.

3. See Charte d’éthique professionnelle des journalistes.

4. See Berufsbild Journalistin-Journalist, p.3.

5. See Journalisme éthique : la campagne de la FIJ.

6. See ECHR ruling might have a chilling effect on data journalism.

7. Yes, Granma’s employees do some interactives, like this voting competition for Cuba’s natural wonders.

8. As far as I know, there is precious little you can do if you are refused the status of journalist by such committees, apart from a request for an internal review. If you know of a different system, please submit an issue to this article. Having been given a press card in both the United Kingdom and France, I can attest of the arbitrariness of the process.

9. However, it seems clear that the conflict of interest an oil firm might have when publishing information is larger than the one a newsroom, even one controlled by an oligarch, might have. I am not aware of a studies that assesses the strength of conflicts of interests among publishers. If you do, please submit an issue.

10. I am no user of the BBC, but it features several times in Wikipedia’s List of television series considered the worst.

11. Look at the etymology of the term or, perhaps more convincingly, at its use in Google Ngram viewer.

12. If journalism is meaningless, why do I keep using it to define my work? One reason is that “information management” does not sound as good. Another is that there are several concepts associated with journalism to which I identify, such as holding power to account, investigating, telling stories and providing new information.

13. See the transcript of Yochai Benkler’s hearing.