stevedoor / fullstackds

Many flavors of data science. Why pick one

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in user YAML: (<unknown>): could not find expected ':' while scanning a simple key at line 4 column 1
---
title: Fullstack Data Science
author: Steve Soloway
description: This document is an outline for what will form the skeleton of a 'learn by doing' reference guide
for the current or aspiring data scientist. On some level, each bulleted item references a topic 
that the author has found useful (or assumes will be useful in the future) while working as a Data Scientist. Each 
bulleted item/topic will eventually be accompanied by some combination of freely accessible articles, tutorials, 
book chapters, visualizations, applications, MOOCs, etc.. There will also some meta tutorials and projects 
designed by the author with the goal of showing a practical application of said topics.
---

[TOC]

Full stack DS

This document is an outline for what will form the skeleton of a 'learn by doing' reference guide for the current or aspiring data scientist. On some level, each bulleted item references a topic that the author has found useful (or assumes will be useful in the future) while working as a Data Scientist.

Each bulleted item/topic will eventually be accompanied by some combination of freely accessible articles, tutorials, book chapters, visualizations, applications, MOOCs, etc..

There will also some meta tutorials and projects designed by the author with the goal of showing a practical application of said topics.


Outline of Topics

Operating Systems and IaaS

Goals

  • Bridge the gap between Windows and Linux

  • Overview and discussion of the command line

  • Get used to scripting to make life easier at the OS level which will also motivate why programming languages exist in the first place. Can help draw distinction between Python and R as well.

    • Understanding the basics of how a computer works is going to make you a better programmer just as understanding how a car works would probably make you a better race car driver
  • Learn Linux basics

  • This provides a natural intro to cloud services and IaaS which we will touch upon


    Hardware

    Software

    Linux

    Programming Languages

    Programming Applications

    Databases

    The Internet

    Language of the web

    APIs & SDKs

    Data (types and formats)

    Security

    Cloud Services: IaaS

    Working in the Cloud

  • Windows:

    • Command Line Overview: one step lower level than the GUI

      • executing programs, batch scripting, scheduling jobs with the windows scheduler, default text encoding
    • Environment variables are your best friend

    • Folder Structure and file types

    • Linux:

      • What is Linux?
        • bash
        • what does open source actually mean?
      • Linux Essentials:
        • bash commands
        • folder structure
        • environment variables
        • package managers
        • bash scripting
        • scheduling cron jobs
        • UTF-8 encoding and file types
    • Windows & Linux - When it makes sense to use one or the other

      • Runtime threading and cores, RAM considerations
      • Single vs. Multithreading
      • Deployment & Automation
  • Cloud Services

  • IaaS Project:

    • GitHub Pages - show how easy it is to create and host a website using github pages. we will build on this later with python notebooks and r markdown files, but this will put everything we learned into practical use and set us up to talk about Cloud Services and Programming Languages.
      • Hacking together a webpage with Jekyll
        • What is actually happening here behind the scenes

  • Reproducibility

  • Ultimately we would like our analyses to be consumable by as many people as possible.

    • Language of the web - mostly terminology and brief overview. As data scientists we will usually interact with web frameworks through simpler API's and languages such as Markdown, YAML, Jekyll, iPython notebook, etc.
      • HTML
      • CSS, etc.
      • JavaScript - server-side vs. client-side; front-end vs. back-end
      • Developer Tools in Chrome
      • Very High level: How Does a modern application work?
  • Ultimately we do not want to worry about whether we're working on the most updated version of our project.

  • Git via GitHub and BitBucket

    • Go deep into git and why it is essential to use for personal development and for collaborative development
    • How to use github's website as your search engine for ideas and projects, and as a plug into any space you're interested in.
      • Building an automated feed to monitor github trends will be the first project
  • Dropbox/OneDrive/Sharepoint - when to use and not to use

  • Ultimately we do not want to have to worry about what types of systems our programs will work on.

    • Microservices and Conatiners
      • Development using Docker
      • PaaS to deploy microservices, automating some of the Iaas steps we've learned
  • Programming Languages

  • Cloud Services - IaaS

  • Cloud Services - PaaS

  • Cloud Services - Databases

  • Resources - places to look when you don't know the answer, cost-benefit analysis of spending time to search for the right answer, how to sift through information as you're doing your research on the answer you need

    • Read the Docs: Learning to read the Documentation for any language or API
    • Googling Effectively
    • StackOverflow
  • Communication

  • Security

  • Data Wrangling

  • Databases

  • Neural Networks & Deep Learning

    • Cloud Services APIs
    • Keras, Tensorflow cloud service API's..
  • Resources: Learning the Mathematical Concepts

    • Probability as throwing darts at a circle
    • Watching videos on youtube
  • Resources: Staying atune with the bleeding edge

  • Mathematical Concepts

    • Probability and Normal Distribution as darts on a dart board
      • Geometry of a circle provides most of the intuition we need for measure theory and probability
      • Entropy & Information Theory
      • Mean and Variance -
      • Approximation, Optimization and Integration: Taylor Series, Geometric Series, MVT, Jensen's Inequality, Complex Numbers
      • Exponential function and Natural Logarithm, Big-O Notation
      • Sorting
    • Regularization

About

Many flavors of data science. Why pick one