Attention journalists: Federal government makes data site more searchable
The federal government is poised to roll out an upgraded version of its open data site. It's one of a growing number of open data sites, and journalists willing to take the time to dig into these storehouses will find a great deal of material to help find and report stories, and to build visualizations that can drive traffic on news websites. The site, first introduced in 2011, will be significantly improved, with better search abilities and more datasets.
By Fred Vallance-Jones, J-Source data journalism editor
The federal government is poised to roll out an upgraded version of its open data site.
The site, first introduced in 2011, will be significantly improved, with better search abilities and more datasets, said a Treasury Board Secretariat official who spoke at a recent computing conference in Ottawa.
According to Sylvain Latour, director of the Open Government Secretariat, the initial and relatively modest open data site saw 1.6 million visitors in its first year.
Related content on J-Source:
- Commentary: Provinces wrong in colluding to deny CBC information requests
- Five free visualization tools recommended by Global's data desk
- The emergence of the market for data-journalism skills
The site is one of a growing number of open data portals being set up by municipalities and senior governments across Canada. Open data refers to governments making previously internal datasets available for further use by the public, data journalists and entrepreneurs, usually via an open data web portal. Generally, the data is released with few if any restrictions on how it may be reused or redistributed, allowing end users to do almost anything they like with it.
The lead for the City of Ottawa’s open data initiative, Robert Giggey, told the same gathering that since that city made bus schedules and near-real-time bus arrival data available, developers have created 15 or so different public transition applications. Ottawa has also recently revamped its data portal.
The federal effort was initially criticized for being populated mostly with vast numbers of geographic and other “harmless” datasets, but the number has grown to nearly 300,000. Some, such as vehicle recalls from Transport Canada and drug recalls and adverse drug reactions databases from Health Canada, will be of substantial interest to data journalists. There is also a lot of statistical content, such as from Immigration Canada, that can provide useful context and background.
Latour from the Treasury Board said officials have been working to bring more departments on board. Persuant to an upcoming Treasury Board directive, departments will be expected to share their data unless they have good reasons not to do so. Personal privacy and sensitive government interests will be protected, as is the case now.
The conference, the High Performance Computing Symposium, is an annual event that brings together researchers, people from the data processing sector and vendors. This year’s focus was on “big data,” which means datasets large enough that they can’t be handled using standard desktop tools.
One of the most prominent invidvuals to speak was Brett Goldstein, chief data officer for the City of Chicago.
Chicago has been a leader in open data, and before taking his current job, Goldstein worked in the city’s police department, using huge quantities of incident data to try to predict where violent crimes would happen, before they happened. The mayor then called on him to be the first chief data officer for any U.S. city, a job in which he oversees the city's extensive open data initiative.
One of the largest datasets Chicago has provided is its crime incident data, going back to 2001. Updated weekly to include everything but incidents from the most recent seven days, the file was 1.2 gigabytes in text format when downloaded in mid June, with details on more than five million individual crimes. It's not big by “big data” standards but big enough. This data was the basis for the Chicago Tribune’s Crime in Chicago interactive, which allows residents to explore crime trends, as well as drill down to crimes in their own neighbourhoods.
The Tribune app is an example of the kind of high-end data application that usually requires significant programming skills. But much of the data available on the open data sites can be downloaded and used by data journalists who have mastered the use of a spreadsheet, database management or mapping applications. With open data sites popping up across the country, journalists are quickly moving from a time when obtaining government data meant a long discussion — maybe a long freedom of-information battle, with government officials — to one where data is ubiquitous and the challenge will be to find what's newsworthy.
Certainly some datasets will still only be available via formal processes, but the overall effect is that we are already drowning in data, with much more to come. It will partly be the effectiveness of these data portals in making data “discoverable,” through robust search and filtering functions, that will keep these sites useful as more and more datasets are added. At the same time, journalists willing to take the time to dig into these storehouses will find a great deal of material to help find and report stories, and to build visualizations that can drive traffic on news websites.[node:ad]
Some other open data sites across Canada: