josephofiowa / ds-blogging

Why you should blog as a data scientist

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

title duration creator
Data Science Blogging
2hr
name city
Joseph Nelson
DC

Twitter GitHub Personal Site

Data Science Blogging

Learning Objectives

Students will be able to:

  • Identify reasons why data science blogging is a strong career step (and fun!)
  • Determine what makes a good data science blog
  • Discover avenues students can pursue to produce a data science blog
  • Discuss Bootstrap and gently "mention" HTML and CSS

Why Blog?

There's three reasons.

Blogging of any kind is a great way to increase exposure. This is especially true in technical fields, where blogs become less of a "nice to have" and more of gateway. Conjecture: blogging is to tech as Twitter is to journalism. In journalism, Twitter is where writers source ideas, interact with readers, break stories, and gain credibility (as a result of the aforementioned). In technology, blogs are where prominent developers (VCs too!) share How Tos, find other like-minded hackers, introduce new methods, and prove legitimacy.

For GA's specific education model, learning-by-doing is complemented exceptionally well by proving-by-showing. Traditional education gives you a piece of paper that the labor market has deemed passable (at times, erroneously) for an individual's talent. GA education invites students to prove themselves through demonstrated work products. This model works exceptionally well in technology. The world's most used product (with 1.7B users) was built by someone without a college degree. The same is true for the world's richest individual. Anecdotes aside, tech is an area where meritocracy rules. And that rules. You have the power, independent of any institution, to prove your value. Nice.

It's fun! Blogging shouldn't and doesn't need to be a burden. Your blog is yours. Its topics, by definition, reveal your interests. This makes it easy to continue to write. As an added bonus, blogging keeps you sharp on your skills, applied to a topic of YOUR choice.

What Are Examples?

There's a wide array of what constitutes a data science blog. I will not provide a comprehensive list; I'll provide examples of resources I enjoy. This subjective list is places where I turn for entertainment, advice, and education.

Intro

As a first and foremost (and, sure, copout example): Google is your friend. Posts of "How to do X" yield tremendous codealong examples. This is where you'll find well-established collections.

Analytis Vidhya is where I often find great codealong examples, like this one of how to webscrape in Python.

Moveover, check out KDNuggets's (a machine learning blog) list of data science blogs. With over 90 to choose from, you have a wealth of examples to enjoy.

Data Journalism (Official)

My favorite data journalism blog is FiveThirtyEight. They do smart visualizations, data collection, engaging writing in politics, economics, health, and lifestyle. Here are a few of my favorite FiveThirtyEight classic: Swing State Interactive, Voting Elasticity, Messi is impossible.

A second favorite is The Upshot, NYT's replacement for FiveThirtyEight when ESPN bought it. They have, to date, produced the only useful 3D visualization I have seen in journalism.

The LA Times, NPR, and Chicago Tribune are ramping up their data journalism efforts and are not to be overlooked.

Individuals

A number of individuals maintain excellent blog presences.

Sebastian Raschka's blog demonstrates educational, approachable pieces. A now defunct blog recommended in the same light is echen.me.

My favorite data science personal blog, and one we should all seek to emulate, is Max Woolf's. Let's check out what he does well in the following:

We'll walk through each of these independently and come together to discuss what we think Max does well.

Note that your audience is equally significant in the choice of your blog. If your blog has a high number of code snippets, it will be immediately less (or un) approachable for non-technical audiences. Because of this, I like the way Max provides both a Jupyter Notebook and written write-up for his blogs.

Tech Company blogs

Your favorite large tech company likely maintains a blog documenting their engineering efforts. Facebook's blog is tremendous. IBM, Microsoft, Adobe, and Google are all well-known for investing heavily in research efforts. These types of blogs enable you to stay current on cutting edge research.

Here's my favorite recent FB blog post from Yann LeCun's team.

DSI

Ben Shaver

Ben (DSI6) regularly posts exceptional posts on Medium.

Let's take a look

Ritika

Ritika's (DSI3) Medium blog and personal site http://www.datawrites.co/

Ritika does a great job of providing qualitative context to her projects and then linking to clean Jupyter Notebooks available on her Github. She also has a link to her Medium at the top, so that's a good place to look for blog post scope.

Mike

Mike Sanders's (DSI4) personal site + blog all in one + template

Mike actually started with a very basic (and relatively unexciting!) blog at first and refined to his current gorgeous theme.

Charlie (site only, not blog)

Charlie Rice (DSI2) built this personal site.

Charlie did a great job of continuing to improve his site with new features and projects.

How Do I Do It?

This video is the best advice.

Your data science blog does not need to be a Corvette. It just needs to exist. As you get better, your blog will too. You will be able to do more and more impressive techniques, and your writing will reflect this. Perhaps you'll even build out an email list. All that matters is that you begin.

Your blog should follow agile development principles. If you don't believe me, checkout a DSI1 student's first toe-dipping: Cody Laminack. Please also checkout one of my favorite blog posts of all time from Josh Patchus, Lead Data Scientist at Cava Grill.

There are many avenues you can take. Let's discuss three.

I would recommend starting with a service that does everything for you except the content: Medium. Medium is the go-to tech blog generation service. Wordpress is heavier than necessary for this.

Medium does a few things for us that make it an optimal choice:

  • Content looks nice! Without us working hard!
  • It exposes us to an audience immediately via hashtags and follows!
  • It sources ideas for us for the same above reasion!

Here's an example Medium post I wrote about snow days and the federal government.

Medium does not clearly support importing code snippets. This is an admitted disadvantage.

As Medium is increasingly pressured to monetize their platform, the UX has suffered. Thus, a potential second option is Ghost. Ghost provides a sleek Medium alternative including well-laid out content and analytics on all reads. You control your own content and use Ghost as a CMS for managing it. The tradeoff, however, is you must pay to host your Ghost blog, like on a Digital Ocean droplet, or paying Ghost to host it for you. (I am migrating my content from Medium to Ghost) Here's a good walkthrough on how you can, too.

A third option option (and what you should strive towards rather than bound yourself to complete!) is using a Python-based CMS like Pelican. Pelican is a Python-based static web development tool. Here's a walkthrough of how to set up a Pelican blog using Github pages. Be sure to check out the themes Pelican offers.

Be cautious: do not bound yourself to needing to engage in Python web development right now. You may make the perfect the enemy of the good!

More blogs!

  • Y-hat is a great place to turn for advice and seeing new implementations
  • Multi-threaded is StitchFix's engineering blog
  • Techblog from Trunk Club is also a great engineering blog
  • Read Sebastian's post on how to data science blog

Part Two

Part two is here

About

Why you should blog as a data scientist