jz1584 / Dealing-with-Data-Spring2016

First update of all Panos files plus a new README

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Norman White - Dealing with Data Spring 2016

The class notes are based on course materials from Panos Ipeirotis
They have been modified to change some of the examples and data sets
as well as fit into a full-term format, but otherwise are mostly his materials
Thank you Panos!

Prerequisite: Setting up Linux on Amazon EC2

Each student in the class will receive their own linux system that has all (most) of the required software, as well as a data directory containing sample data sets. The original system is free, but you can enlarge it for a small fee for large projects. You will need to give Amazon a credit card to get started, but it won't be charged if you stick with the basic machine.

The Basics: SSH, command-line, CURL

In class: Find Web API using Mashape, issue requests using CURL

Relational Databases

Entity-Relationship Model

  • Entities, Primary Keys, and Attributes
  • Relations
  • Cardinality: One-to-One, One-to-Many, Many-to-Many
In class: Artist-Gallery-Painting example

From ER Diagram to SQL Tables

  • Translating ER Diagrams to Tables
  • SQL Statements for Creating Tables

Querying a Database Using SQL

  • USE, DESCRIBE queries
  • Selection queries: *, column, column AS, DISTINCT, ORDER BY, LIMIT
  • Where clauses: Boolean conditions, IN, BETWEEN, LIKE
  • Aggregation queries: GROUP BY, SUM, AVG, MAX, MIN, ROLLUP
  • Join queries: INNER JOIN, OUTER JOIN
  • Subqueries and Views
In-class Exercise: Compare Tastes Across Demographic Segments

Additional Resources

Introduction to Python

Primitive Data Types

  • Strings
  • Integers, Floats, and Math operators
  • Booleans

Complex Data Structures

  • Lists
  • Sets
  • Tuples
  • Dictionaries
  • Nested data structures

Control Statements

  • Conditional statements (if-then-else)
  • Loops (for loops, list comprehensions)

Beyond the Basics

  • Functions
  • Libraries
  • Files
In-class Exercise: Find Similar Company Names

Additional Resources

Regular Expressions

  • Atoms
  • Anchoring expressions
  • Repetition and Grouping operators
In-class Exercise: Extract Email from Web Page

Web API's, Crawling & XPath

  • Python and Web APIs
  • Beyond the Basics: Parameters and Headers
  • (advanced) Using OAuth for authentication
  • XPath
  • Crawling Websites
In-class Exercise: Retrieve Buzzfeed articles

Python and Databases

  • Interacting with a database using Python
  • Inserting data in a database using Python
  • Retrieving data from a database using Python
In-class Exercise: Retrieve live weather or Citibike data and insert in database

Processing Data using Python Pandas

Data Plotting and Visualization

Text Mining and Natural Language Processing

About

First update of all Panos files plus a new README


Languages

Language:Jupyter Notebook 100.0%