Time consuming, crazy-making and a vital tool for investigative reporting: Anna Mehler Paperny, senior producer of globalnews.ca's investigative data desk, explains the ins and outs of data-driven journalism and why more journalists should be crunching numbers.
By Anna Mehler Paperny
Why bother with data-driven journalism?
Because there’s more data out there than ever before and not nearly enough people asking it questions.
Because trend stories aren’t good enough. Nor is telling readers something they already know.
Because tweeting, posting to YouTube and selectively releasing datasets do not a transparent government make.
Because your role as reporter is more microscope than conduit.
Related content on J-Source:
- Opinion: Why journalism schools must teach code–even if every journalist is not wired that way
- Journalism's struggle with infographics: The facts and figures behind omnipresent images
- Why journalists should practice safe data hygiene
Data journalism becomes more accessible (especially to the numerically reluctant, like me) if you think of it simply as stories backed by quantifiable evidence. And we start the way you’d start any story—with a question.
This gives you a place to start, and helps you ferret out what data you’ll need. The next question: Who has that info?
In Canada, chances are you’ll be chasing some level of government or government-funded entity. Make friends with Statistics Canada—become overly familiar with its CANSIM tables and equally intimate with the lovely folks working there (quickly, before their ranks become totally decimated). In the case of Statistics Canada's National Household Survey, non-response rates for every neighbourhood in the country were right in the spreadsheet downloaded off their website, if you poked around long enough.
Sometimes, just finding the right data can be an odyssey of its own. My colleague Leslie Young spent the better part of a year shuttling back and forth from one department to another until she got the database she needed, which had been available for purchase the whole time. A large proportion of the information we get comes from access-to-information requests at the municipal, provincial and federal levels. I could go on about ATIPs for ages (and, if you catch me on a cranky day, I will).
Once you have the data, interview the information. It isn't enough to take a dataset at face value, any more than you would an interview subject. Who are you? Where are you from? What are you doing here? What are you telling me?
Knowing who collected a set of information, how they got it and to what end can tell you a lot. Survey or census? Voluntary or mandatory? Self-reported or externally gathered? Is it clean? Is it consistent? Error-laden data, while a massive pain, can be a story in itself.
In sifting through thousands of Alberta spills, we became intimately familiar with the provincial regulator’s own nomenclatural inconsistencies. And the very reason we dove into National Household Survey data was to map its shortcomings.
Once you know what you’re dealing with, data-wise, the story starts taking shape. Look for patterns, trends, anomalies. Expect to correct your hypotheses—or adjust them, at least. And then you chase. The voices and pathos you need for a data-driven story are the same as for any other. If info’s your story’s backbone, you still need humans at its heart.
So how do you tell a data-driven story? In a best-case scenario, you have the luxury of both time and resources. In real life, that’s never the case: You don’t have the skills or the people or the money or the time to tackle the project that matches your ambition. That’s OK. Fight for the resources you need, then work with what you got.
The way you tell your story should be dictated by the story you’re trying to tell. Sounds like a no-brainer, but it’s too easy to start with a tool, visualization or interactive that looks cool, without regard for how that translates what you’re trying to say.
Put yourself in the mind of someone coming at this cold – tough to do after being immersed in a topic for weeks or months. What do you see first? What are you wondering? What would convey complex information in a way that’s intuitive and straightforward without oversimplifying?
Keep asking, “So what?”
In the case of Crude Awakening, we had multiple articles that stood on their own, augmented in places by photo and video. A pair of interactives let readers find spills anywhere in the province, by volume, company and cause, and a time-lapse animation showed what 37 years of spills looks like.
For our National Household Survey investigation, we combined tables and interactive maps showing Patrick Cain’s deep dive into those least likely to fill out the survey based on various demographic characteristics with interviews on the impact of this shoddy population data.
And you follow up. This is tough to do for journos used to filing and moving on. But you find the same issues reemerging. People galvanized by your data do some digging. The story takes on a life of its own.
It ain’t easy, necessarily. It can be mind-meltingly frustrating. But it’s an incredible amount of (incredibly nerdy) fun.
Anna Mehler Paperny is senior producer of globalnews.ca's investigative data desk. She has reported from Haiti, Guantanamo Bay, Vancouver, Toronto, New York City and spotty connections across most of China and Ontario. She has worked for the Kingston Whig-Standard, the Edmonton Journal and the Globe and Mail. She is a graduate of the Queen’s Journal. On Twitter she's @amp6.
Related content on J-Source:
- Five free visualization tools recommended by Global's data desk
- Global’s data journalism team “a first” in Canada
- A growing number of colleges and universities now offer data journalism courses