A New Approach to Data Science Blogging

Blogging in Data Science is broken. We have these spiffy new notebooks that allow us to capture our analysis workflow in a reproducible way but no great way to share them. There are ways to share them, but they take too much effort (yes I'm looking at you Pelican).

An Easier Way


I love it when things I'm only tangentially interested in just work. I love that I can blog on PostHaven by just sending a nicely formatted email with inline pictures. I love that GitHub automatically renders IPython notebooks you view on the site. I don't love hacking on HTML/CSS so my blog renders my IPython notebooks correctly after I've converted them to HTML using Pelican.

Coincidentally, have you ever noticed that data scientists tend to work at two different levels? One is amongst their peers. Working with math and stats, talking about how to use SQL to get some data, automating intelligent quantitative analyses... and then talking with management about cost/benefit trade offs, finding new business opportunities and cost savings, as well as performing risk analyses. Why do we try to have one blog meet everyone's needs?

A Format For Each Use Case


Starting today I'm going to break my blog into two levels: The most public facing one (here) will be written at a high level. Meant to be consumable by quantitatively minded business people and that will leave data scientists/analysts informed and ready to drill down into more detail.

"But how did you come to these conclusions?"

"Prove that this works."

The second level is reproducible research. There will be a link in each article to a research notebook on GitHub. It can be downloaded and ran locally so you can follow along at home if you're so inclined.

What I like about this is it follows a pattern I've seen evolve in my work professionally. I have two write ups for every analysis: One is the notebook where I actually did my analysis and the other is the email where I share the summary of what I found and some general details of how I found it.

(EDIT: My current notebooks are here: https://github.com/jcbozonier/research/tree/master/notebooks)

Call for Feedback


If you've solved these problems in a different way I'd love more ideas. Comment or tweet or email or... something. Just let me know. Thanks!