dswalter / tobedangerous

Just enough about data science to be dangerous.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

To be an effective data scientist requires a relatively clear understanding of a dizzying number of topics. This repo is designed to help ease that transition. It's not a full curriculum, hence the name. You won't know enough about data science to be great at it, just "enough to be dangerous."

I've tried to vary the content based on what would be most helpful. It makes sense to present basic machine learning concepts in a Jupyter notebook, and so we'll use that. A discussion that doesn't require code might work better as a pure markdown post, then.

We'll see how this evolves. If you haven't had much experience with programming before, I'd recommend starting with "basics of computing." Whether you're working in data analysis, theoretical statistics, applied machine learning, etc., you will be using a computer, and in fact, doing different kinds of programming.

These are things I wish I knew ahead of time. The things in the computing folder aren't really progrmaming basics. They're the things you acquire around the edges during a computer science undergrad curriculum; that is to say, they're not usually taught, they're absorbed. Things like how (and why) to work from the command line, what a Jupyter Notebook is (and why it matters) and how to go about doing version control.

Knowing these things will hopefully help you avoid a bunch of headaches.

About

Just enough about data science to be dangerous.