Bridging the Gap between Data Science and Journalism

An interview with Chris Wiggins from The New York Times.

There’s a lot to be excited about at The New York Times these days: new digital properties, a shift towards unbundled apps, and better hygiene? Scientific hygiene, that is. According to Christopher Wiggins, associate professor of applied mathematics at Columbia University and Chief Data Scientist at The New York Times, “every single field eventually becomes computational” and “that spirit is now happening in journalism.” To Wiggins, good scientific hygiene requires two things:  sharing data sets and creating code that’s reusable and well enough documented that you can explain it to others “including yourself in six months when you’ve totally forgotten about it.” Added Wiggins, “if you believe in science” then you must also believe in reproducibility and “reproducibility breaks down if you only share four pages of prose.”

All of the buzz around data at the Times might feel out of place to some or like the latest, greatest fad to others but Wiggins assured the audience that the Times is getting serious about data in a big way. “Over the last few years, The New York Times has made a real investment in bespoke tracking solutions, making data science possible,” explained Wiggins. “All of the plumbing had to be done first before you could start doing interesting predictive analytics.”

One of the most visible byproducts of the Times’ dive into data is TheUpshot – a new property that “presents news, analysis and data visualization about politics and policy.” Not only do The Upshot’s stories rely heavily on data to illuminate pressing issues but they also invite readers to check out the code and the data sets behind the stories – placing a huge emphasis on open source statistical tools and replicability. According to Wiggins, this focus on transparency is a big deal: “There’s a huge difference between interacting with spatial temporal data (and the way you can interrogate the data) and simply working with counts. The Upshot is doing a great job with that.”

In fact, Wiggins is bullish on the data revolution taking place in newsrooms throughout the industry – pointing to Nate Silver’s FiveThirtyEight and Vox as early leaders in this new world. “It’s a very exciting time for journalism,” declared Wiggins. “Each new product opens up new opportunities and new questions to ask. And normal citizens are really interrogating the data, interrogating the code, and interrogating the interpretations.”

With so much focus on data and even machine learning in the newsroom, some in the audience were left wondering whether the value of the journalist is diminishing before our very eyes. As Wiggins tells it, the opposite is true: “We’re dealing with complex, messy data sets where you can easily fool yourself. The input of domain knowledge is the difference between ‘computer assisted reporting’ and data journalism, and data journalism is rightfully getting a lot of attention.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s