Finding the keys to government data: seminar report

Many approaches, same interest: The Bergen seminar on open government data brought together journalists, academics, civil servants and business innovators.

The sem­i­nar with the opti­mistic head­line “Give us our data” was organ­ised by the Info­me­dia depart­ment at the Uni­ver­si­ty of Bergen. The depart­ment has ini­ti­at­ed and fund­ed a fact-find­ing project on Nor­we­gian gov­ern­ment data this autumn, hop­ing that the project report and the sem­i­nar can help move the top­ic high­er up on the polit­i­cal and busi­ness agendas.

A whole cat­a­logue of inter­est­ing facts and opin­ions about gov­ern­ment data — that’s what I’m tak­ing away from the sem­i­nar (of course, as I helped organ­ise it this is a com­plete­ly sub­jec­tive and biased view!). 

Open data as a top­ic is unusu­al in that it brings togeth­er peo­ple with very dif­fer­ent roles and back­grounds, from com­put­er sci­en­tists via pub­lic sec­tor spe­cial­ists to jour­nal­ists, busi­ness entre­pre­neurs and inno­v­a­tive civ­il ser­vants. The pre­sen­ta­tions and debates at the sem­i­nar always zoomed in on the same ques­tions, but from dif­fer­ent angles: Why should more gov­ern­ment data be made pub­lic? What obsta­cles are in the way and how can they be passed? What can we do with the data?

These are my notes from the sem­i­nar pre­sen­ta­tions, sup­ple­ment­ed with slides from speak­ers. See also oth­er reports and remarks (in Nor­we­gian): Bente Kalsnes’ post on Origob­loggen has sparked a live­ly debate, and a blog com­ment from Anders Waage Nilsen sum­ma­rizes the day very efficiently.

Denmark: Demand-driven approach

Cathrine Lip­pert from Den­mark’s Nation­al IT and Tele­com Agency report­ed on the agen­cy’s ini­tia­tives to improve access to gov­ern­ment data. They include a project com­pe­ti­tion for inno­v­a­tive ser­vices (win­ners to be announced at a con­fer­ence on Feb­ru­ary 4), an inno­va­tion pro­gramme direct­ed towards the pri­vate sec­tor, and a data source cat­a­logue on the social plat­form Planned is also an open data desk which can pro­vide assis­tance, define guide­lines and high­light good practices.

Down­load pre­sen­ta­tion (pdf).

The agency tries to advance its agen­da by appeal­ing to and bring­ing togeth­er inter­est­ed groups in both the pri­vate and pub­lic sec­tor. Lip­pert said the agency believes it can accom­plish more by this demand-dri­ven approach mobil­is­ing the grass­roots. A top-down approach is hard, as open data does not have the same polit­i­cal weight as cur­rent­ly in Britain and the US.

Britain: Data and innovative journalism at The Guardian

The British news­pa­per is at the van­guard of using data in jour­nal­ism. Simon Rogers, edi­tor of the Dat­a­blog, explained how The Guardian works toward the “mutu­al­i­sa­tion of data”. Data is shared with users by pub­lish­ing the data mate­r­i­al behind sto­ries on the Google Doc­u­ment plat­form — sim­ple and user-friend­ly. A Flickr group has been set up to col­lect users’ own visu­al­iza­tions of data.

Increas­ing­ly, the role of jour­nal­ists will be to guide the pub­lic through the vast for­est of data; to be cura­tors of infor­ma­tion, Rogers said. 

The Guardian’s “crowd­sourc­ing” of research­ing the files of par­lia­ment mem­bers’ expens­es is already famous. More than 23.000 users took part in review­ing the files. The edi­tors learned from the exper­i­ment that when you ask users for help, you need to define man­age­able tasks and you should give the users some­thing back for their efforts. When a new batch of data was released, the edi­tors gave more spe­cif­ic tasks and the job was done in one and a half days.

Last week, The Guardian launched its own gate­way to pub­lic data por­tals. In the future, they also want to give peo­ple visu­al­iza­tion tools, Rogers explained.

Hidden data and how to find them

Web devel­op­er Har­ald Groven at the Nor­we­gian Cen­tre for ICT in Edu­ca­tion focused his pre­sen­ta­tion on how vast amounts of high­ly inter­est­ing pub­lic sec­tor data are kept under lock and key. In the ana­logue era pub­lish­ing medi­um or low lev­el aggre­gates of data was prac­ti­cal­ly impos­si­ble — there was­n’t enough paper. This is no longer rel­e­vant, but the same prac­tices remain, Groven said. Legal con­straints are part of the rea­son why Sta­tis­tics Nor­way and oth­er insti­tu­tions do not release more fine-grained data.

A Nor­we­gian gov­ern­ment data por­tal should con­cen­trate on mak­ing avail­able anonymized low lev­el aggre­gat­ed sta­tis­tics, data sources that are large­ly unknown today, Groven rec­om­mend­ed. He illus­trat­ed the propo­si­tion with exam­ples from his own work devel­op­ing ser­vices aimed at giv­ing young peo­ple a bet­ter basis for mak­ing deci­sions about what to study. A type of data need­ed for one of the ser­vices, salary lev­els in dif­fer­ent occu­pa­tions, was dif­fi­cult to get access to at a suf­fi­cient­ly detailed level.

A news journalist’s perspective: TV 2

Jour­nal­ists often expe­ri­ence that pub­lic sec­tor agen­cies want to con­trol the pre­sen­ta­tion of data, Gaute Tjem­s­land of Nor­we­gian TV 2’s news web­site said in his pre­sen­ta­tion. When TV 2 want­ed the data from nation­al school tests, the min­istry respond­ed by send­ing pdf doc­u­ments, before final­ly cav­ing in and releas­ing the spread­sheets that they had had all along. The rea­son was explic­it­ly that they did­n’t want the media to pro­duce school rank­ings — i.e. present the data in their own way.

For jour­nal­ists, the ide­al sit­u­a­tion is to get struc­tured data, as detailed as pos­si­ble, and as fast as pos­si­ble, Tjem­s­land com­ment­ed. He had encoun­tered three main obsta­cles. Pub­lic sec­tor agen­cies want to retain con­trol over infor­ma­tion; they are afraid of los­ing rev­enue; or in many cas­es they are not aware that their data can be valu­able to oth­ers. The last obsta­cle is prob­a­bly the most impor­tant, Tjem­s­land said.

The TV 2 edi­tor pro­posed bench­mark­ing the open­ness of gov­ern­ment insti­tu­tions. By defin­ing vari­ables to mea­sure trans­paren­cy, more pres­sure can be applied to have gov­ern­ment data released. The media need to do their part by demand­ing infor­ma­tion and should take a lead­ing role in the debate about open data.

The need for a

In my own pre­sen­ta­tion, I empha­sized four main find­ings from the project at the Info­me­dia depart­ment — based on a sur­vey among state agen­cies, an eval­u­a­tion of state agency web­sites and inter­views with civ­il ser­vants at the local and region­al level. 

We found that there is a scarci­ty of infor­ma­tion about what data sources that actu­al­ly exist. Very few agen­cies pro­vide sub­stan­tial infor­ma­tion about their own datasets. Sec­ond, a cen­tral data­s­tore, a, does­n’t exist; there­fore we cre­at­ed a sim­ple “store” of our own using a Google spread­sheet. With the help of a small com­mu­ni­ty around 130 data sources have been reg­is­tered there so far. Third, our sur­vey and inter­views con­vinced us of the great poten­tial that exists in mak­ing more data avail­able. Among oth­er results, six out of ten agen­cies said they plan to make more data avail­able dur­ing the next year. Final­ly, I high­light­ed how knowl­edge of open data issues vary wide­ly across sec­tors and agen­cies. This prob­a­bly reflects the low pro­file that the top­ic still has polit­i­cal­ly and in the pub­lic sphere.

In our project report we make ten pro­pos­als for mak­ing more data avail­able in Nor­way. In the pre­sen­ta­tion I empha­sized four of them: Cre­ate data­s­tores at the state, region­al and local lev­els; define prin­ci­ples and guide­lines; give spe­cial atten­tion to pri­va­cy issues; and define and fund pilot projects to kick-start the process.



ge data




  1. […] datasets avail­able for re-use. Our first project report (see Eng­lish sum­ma­ry), pre­sent­ed at a sem­i­nar in Bergen in Jan­u­ary, is main­ly based on this work. Among the […]

til toppen