European and German data sources

I gave this lecture to a group of mostly non-European journalists new to data-driven reporting in late 2017.

European administrations, and Germany’s in particular, measure many things. After all, the very word statistic comes from German, where it means “the science of the state” (Staatistik). Across Europe, hundred of statistical agencies compile thousands of measurements, from soil pollution to happiness levels. Learning to navigate this maze is worth it. Many story leads sleep in these data sets, and many other can provide useful context for current or past news items.

The go-to sources

The gateway to European data is called Eurostat. The agency is run directly by the European Commission, the executive arm of the European Union, and does two things: it aggregates data from its 28 member states and it publishes data collected by European institutions. When working on topics for which the European Union is directly responsible, such as international trade or deportations of asylum seekers under the Schengen Treaty, Eurostat has the best and most up-to-date data.

Eurostat also runs several surveys, such as the Eurobarometer and the Urban Audit. Both surveys have been running for years (some questions of the Eurobarometer have been asked since 1973!), which is useful to detect long-term trends in Europe.

When it comes to national data, Eurostat is not as useful. Although the agency should harmonize statistical methods across the Union, some states refuse to apply the common guidelines. When Eurostat decided to add prostitution and illegal drug sales to the gross domestic product, a measure of the size of the economy, some Member States, among which France, refused to follow.1

A bigger concern lies in Eurostat’s relative lack of power. When a Member States sends fraudulent data, Eurostat can refuse to validate it but cannot impose sanctions on the responsible national statistics office. The problem applies mostly to Greece, whose budget and deficit statistics were false throughout the 1990s and early 2000s,2 but on whom Eurostat had little impact.

Apart from Greece, European national administrations have fairly good statistics by international standards, because having good statistics is part of the “chapters” that countries had to fulfill before joining the European Union.3 And because the Union has invested millions to overhaul the statistical apparatus of newer member states and some neighboring countries.

The German the national statistics office is known by its short name, DeStatis. Because of the federal structure of the German state, many German data sets remain in the 14 regional statistics office. DeStatis tries to aggregate regional data on a single portal, while national data sets are stored in a database called Genesis. Some data sets can be found in both. Other data sets that can be of interest to foreign journalists are stored elsewhere. Data from the general elections, for instance, are stored at the federal returning office. For local elections, the data is at the regional returning offices.

This seemingly complex data structure is not unique to Germany. All statistical offices, from Eurostat to the French INSEE, have byzantine websites that are seemingly optimized to avoid indexation by seach engines, making it hard to find any given data series. They use a variety of formats, from Microsoft Excel to comma-separated values to exotic format such as Beyond 20/20®, which can only be read by a software that runs on Windows (98, ME, NT, 2000, XP or Vista). Luckily, many of them have English-speaking staff ready to answer the phone and help journalist, including preparing custom-made exports of data.

Industry-specific sources

National statistics office only deal with aggregates. After all, their goal is to help their government make policy. For finely grained data on industry-specific issues, other sources exist and we mostly have to thank the European Union’s love of monitoring for it.

A central database of public tenders in Europe is available at ted.europa.eu. It contains all public tenders in the Union above a certain amount, which varies by sector. Activists use this database to run Red Flags, an automated system that looks for potential story leads in public tenders, such as suspiciously fast procedures or a lack of evaluation criteria.

TED is not the only comprehensive, industry-specific database propped up by the European Union. Entsoe.eu offers live data on power generation throughout the Union. The website of the wind power industry, Wind Europe, uses this data to provide daily visualizations of the electricity mix by country. Eurocontrol, which is technically not an agency of the European Union, provides live data on air traffic on its website, such as delays and incidents. The European Central Bank has a Data Warehouse with good data on macro-economic indicators. The list could go on.

Such data sources are made by the industry or the industry regulator and for the industry, and are therefore especially not user-friendly. It takes time to understand how the interfaces work, what the data means and what can be done with it. Once these hurdles are overcome, they provide rich information for a journalist working on a specific beat.

Many industry-specific data repositories are public but some are not, such as the SPIRS database of dangerous industrial sites. In such case, know that the Regulation 1049/2001 of the European Union allows anyone to request a public document from European institutions. Activists based in Madrid run AsktheEU.eu, a website that simplifies such requests. Officials of the EU rarely respond happily to such requests (be ready to receive the Excel file you asked for in the mail, printed over 200 A4 pages), but, unlike their colleagues in national administrations, they do respond.

Layers of open data

In the early 2010s, European politicians, pushed sometimes by activists, decided to “open” data. Most of them had no idea what it implied and did so purely to imitate a policy that Barack Obama championed early in his first term. In an ironic turn of events, the open data movement came on the heels of a movement towards paid data in the late 1990s and early 2000s, which incentivized public administrations to develop new revenue streams, among which the sale of their data.4

Unsurpisingly, few administrations played along and those who did did so reluctantly. Ten years after the start of the open data movement, the situation is patchy. Many data portals are already dead. The open data portal of Baden-Würrtemberg, one of the wealthiest German regions, can only be found at the Internet Archive. After a laborious birth in 2013, the website was shut down in 2016 (it should reopen in 2018).5 (Baden-Würrtemberg is not alone, Open data Seine-Maritime, in France, was also unavailable at the time of writing.) Other portals zombie around unmaintained, thereby polluting searches for data with thousand of outdated files, broken links and dead ends.

On the other hand, many open data initiative are still dynamic. The open data portals of Poland, France or Berlin list hundreds of resources, including application programming interfaces (APIs) that allow for automated data retrieval. These administrations realized that opening and sharing data helped them do their job - just like industry-specific data portals help both companies and regulators.

Journalists need to spend time poking around open data portals at different levels (in much of Europe, and in most of Germany, the administration is organized in four layers of city, region, state and European Union, but some territories can have six, seven or more layers, each with specific attributions and a dedicated open data portal) and assessing their reliability before they can become useful sources.

Dark corners

Statistics office, industry-wide services and administrations provide hundreds of data sources, which, even if they are very rarely user-friendly, tend to yield very interesting data sets after a few hours spent exploring and asking around. Some corners of European life, though, remain hidden. Want a list of the businesses owned by an organization in Austria? Access to the German land registry? Tough luck.

During the process of accession to the European Union, most post-socialist countries had to open up data (and countries currently on a path towards accession followed suit). This is why the privatization portal of Serbia, the land registry of the Czech Republic or the public servant assets database of Romania are trend-setters in transparency. In Germany, the only data about the severly deficient privatization program of the 1990’s6 you will find is a PDF file published by the… German old-age pension fund. Access to the commerce registry is expensive and can only be done on a per-company basis. Excerpts from the land registry can only be requested by paper mail and need to be motivated.

Almost all European countries have legislation ensuring access to public documents but, apart from European institutions, very rarely respect it. In Germany, journalists had to wait seven years and go to court to access data on the state of German bridges (they showed that they are falling apart faster than they are repaired)7. In France, a news magazine is currently taking two ministries to court to force them to respect local open data legislation.8 Administrations do not only passively refuse to communicate that data ought to be open by law. They sometimes counter-attack. In 2011, a German newspaper published leaked documents related to the war in Afghanistan. The administration successfully sued them not because of national security but, instead, on copyright grounds.9 With such resistance, not all data will be easily opened up in Germany and Europe.

Newsletter

In case you want to read my next essay in your e-mail inbox, type you email below and you'll be all set.

Notes

1. Read Sizing Up Black Markets and Red-Light Districts for G.D.P. at the New-York Times and France refuses EU order to include drugs, prostitution in GDP figures at Radio France Internationale.

2. Read for instance this piece from 2004 at The Independent: Greece admits deficit figures were fudged to secure euro entry.

3. Chapter 18 in the negotiations.

4. I wrote about this in The Power of Open Data in June 2016.

5. It was supposed to be revived in September, 2017, according to an article in Süd-Kurier. On December 8, 2017, the interior ministry of BW said in a tweet it would be relaunched in the second quarter of 2018.

6. Read for instance Der Deutsche Goldrausch, by Dirk Laabs.

7. Read Wettlauf gegen den Verfall at Die Welt

8. Read Open Data « par défaut » : Next INpact traîne deux ministères devant le Conseil d’État.

9. Read Urheberrecht: WAZ muss Afghanistan-Papiere depublizieren at NetzPolitik.


Comments

Want to raise an issue about this post? Please open a new one on Github or make a pull request directly, but make sure to read the rationale behind this blog first.