The seminar with the optimistic headline «Give us our data» was organised by the Infomedia department at the University of Bergen. The department has initiated and funded a fact-finding project on Norwegian government data this autumn, hoping that the project report and the seminar can help move the topic higher up on the political and business agendas.
A whole catalogue of interesting facts and opinions about government data — that’s what I’m taking away from the seminar (of course, as I helped organise it this is a completely subjective and biased view!).
Open data as a topic is unusual in that it brings together people with very different roles and backgrounds, from computer scientists via public sector specialists to journalists, business entrepreneurs and innovative civil servants. The presentations and debates at the seminar always zoomed in on the same questions, but from different angles: Why should more government data be made public? What obstacles are in the way and how can they be passed? What can we do with the data?
These are my notes from the seminar presentations, supplemented with slides from speakers. See also other reports and remarks (in Norwegian): Bente Kalsnes’ post on Origobloggen has sparked a lively debate, and a blog comment from Anders Waage Nilsen summarizes the day very efficiently.
Denmark: Demand-driven approach
Cathrine Lippert from Denmark’s National IT and Telecom Agency reported on the agency’s initiatives to improve access to government data. They include a project competition for innovative services (winners to be announced at a conference on February 4), an innovation programme directed towards the private sector, and a data source catalogue on the social platform digitaliser.dk. Planned is also an open data desk which can provide assistance, define guidelines and highlight good practices.
Download presentation (pdf).
The agency tries to advance its agenda by appealing to and bringing together interested groups in both the private and public sector. Lippert said the agency believes it can accomplish more by this demand-driven approach mobilising the grassroots. A top-down approach is hard, as open data does not have the same political weight as currently in Britain and the US.
Britain: Data and innovative journalism at The Guardian
The British newspaper is at the vanguard of using data in journalism. Simon Rogers, editor of the Datablog, explained how The Guardian works toward the «mutualisation of data». Data is shared with users by publishing the data material behind stories on the Google Document platform — simple and user-friendly. A Flickr group has been set up to collect users’ own visualizations of data.
Increasingly, the role of journalists will be to guide the public through the vast forest of data; to be curators of information, Rogers said.
The Guardian’s «crowdsourcing» of researching the files of parliament members’ expenses is already famous. More than 23.000 users took part in reviewing the files. The editors learned from the experiment that when you ask users for help, you need to define manageable tasks and you should give the users something back for their efforts. When a new batch of data was released, the editors gave more specific tasks and the job was done in one and a half days.
Last week, The Guardian launched its own gateway to public data portals. In the future, they also want to give people visualization tools, Rogers explained.
Hidden data and how to find them
Web developer Harald Groven at the Norwegian Centre for ICT in Education focused his presentation on how vast amounts of highly interesting public sector data are kept under lock and key. In the analogue era publishing medium or low level aggregates of data was practically impossible — there wasn’t enough paper. This is no longer relevant, but the same practices remain, Groven said. Legal constraints are part of the reason why Statistics Norway and other institutions do not release more fine-grained data.
A Norwegian government data portal should concentrate on making available anonymized low level aggregated statistics, data sources that are largely unknown today, Groven recommended. He illustrated the proposition with examples from his own work developing services aimed at giving young people a better basis for making decisions about what to study. A type of data needed for one of the services, salary levels in different occupations, was difficult to get access to at a sufficiently detailed level.
A news journalist’s perspective: TV 2
Journalists often experience that public sector agencies want to control the presentation of data, Gaute Tjemsland of Norwegian TV 2’s news website said in his presentation. When TV 2 wanted the data from national school tests, the ministry responded by sending pdf documents, before finally caving in and releasing the spreadsheets that they had had all along. The reason was explicitly that they didn’t want the media to produce school rankings — i.e. present the data in their own way.
For journalists, the ideal situation is to get structured data, as detailed as possible, and as fast as possible, Tjemsland commented. He had encountered three main obstacles. Public sector agencies want to retain control over information; they are afraid of losing revenue; or in many cases they are not aware that their data can be valuable to others. The last obstacle is probably the most important, Tjemsland said.
The TV 2 editor proposed benchmarking the openness of government institutions. By defining variables to measure transparency, more pressure can be applied to have government data released. The media need to do their part by demanding information and should take a leading role in the debate about open data.
The need for a data.gov.no
In my own presentation, I emphasized four main findings from the project at the Infomedia department — based on a survey among state agencies, an evaluation of state agency websites and interviews with civil servants at the local and regional level.
We found that there is a scarcity of information about what data sources that actually exist. Very few agencies provide substantial information about their own datasets. Second, a central datastore, a data.gov, doesn’t exist; therefore we created a simple «store» of our own using a Google spreadsheet. With the help of a small community around 130 data sources have been registered there so far. Third, our survey and interviews convinced us of the great potential that exists in making more data available. Among other results, six out of ten agencies said they plan to make more data available during the next year. Finally, I highlighted how knowledge of open data issues vary widely across sectors and agencies. This probably reflects the low profile that the topic still has politically and in the public sphere.
In our project report we make ten proposals for making more data available in Norway. In the presentation I emphasized four of them: Create datastores at the state, regional and local levels; define principles and guidelines; give special attention to privacy issues; and define and fund pilot projects to kick-start the process.