Chipdelmal / dataPy_CADi

Materials for the "Data Wrangling" CADi workshop @ "Tecnológico de Monterrey"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataPy CADi

This repository contains the materials for the Data acquisition, wrangling and exploratory analysis in Python, three days intensive CADi ("Cursos de Actualización en las Disciplinas") for faculty members at "Tecnológico de Monterrey" Institute.

The course covers subjects include the parsing and handling of data from different social sources, as well as the use of current frameworks for data-driven analyses.

For other data-analysis related topics please take a look at the dataViz_CADi repository. Which contains exercises on data visualization in R, Python and Mathematica.



Contents

This workshop was created with flexibility in mind. As such, modules are fairly independent and can be followed in a different order than the one suggested here. For a topic-oriented breakdown of the contents, please have a look at the sitemap.

Day 01 (8h)

  1. Introduction: Objectives, scope, requirements and expectations.
  2. Python 101: Introduction to the programming language (description, core types, collections and functions).
  3. Python Environments: Using anaconda and virtualenv for development.
  4. Pypi: Installing, browsing, and handling python packages.
  5. IDE's: Using IDLE, Jupyter, Spyder, nteract, and Atom to write and launch our code.
  6. Git: Version control using github for code development, sharing and collaboration.

Day 02 (8h)

  1. Data Wrangling: Primer: Data science and how does data wrangling fit into it.
  2. Data Wrangling: Part 1: Using pandas and matplotlib.
  3. Data Sources: Twitter: Interfacing with the API to get trends, tweets, and tags.
  4. Intermediate Python: Dealing with files, serialization and a simple cases of parallel computing.

Day 03 (8h)

  1. Data Wrangling: Part 2: Using scikit-learn to parse, manipulate, and pre-analyze data.
  2. Data Sources: Google Trends: Retrieving trends from google searches.
  3. Twitter: Tweets and text sentiment analysis.
  4. Python pkg: Creating and installing a custom python package.
  5. Advanced Python: Higher-level topics (garbage collection, lambda functions).

Extras

  1. A Story to Tell: Data-driven storytelling.
  2. Data Sources: Part 3: Obtaining data from Web Scraping (beautifulsoup), RSS (XML), Dropbox API.
  3. GeoData: How to work with geographic datasets with geopandas and osmnx.


Resources

Tools and Packages

  • anaconda: DataScience/Package manager platform for python and R.
  • atom: Versatile IDE for R, Python, Markdown, Javascript, amongst others.
  • matplotlib: Python's most popular package to plot data.
  • numpy: Highly efficient array manipulation in Python.
  • pandas: Popular dataframe manipulation in Python.
  • plotly: A good alternative for interactive plots in Python (similar to Shiny in R).
  • onlinegdb: Online Python interpreter (originally developed for C and C++).
  • repl.it: Online Python IDE and interpreter (also supports many other languages).
  • scikit-learn: Data analysis and machine learning platform for python.
  • sympy: Symbolic calculus in Python.
  • Google Earth Studio: Useful to create geographic visualizations (currently under beta program).
  • Scrapy: Web-scrapper application for Python
  • BeautifulSoup: An approachable web scraper application.
  • Spacy: Advanced natural language analysis library.
  • NLTK: Natural language toolkit for python.
  • Seaborn: Documentation for the seaborn statistical visualization package.
  • xlrd: Excel data reader.

Online

Books


Alumni

Faculty

Rick Leigh Swenson Durie • Humberto Cárdenas Anaya • Norma Amanda Elías Solís • Rubén Darío Santiago Acosta • Faustino Yescas Martinez • Raúl Gómez Castillo • Luis Angel Trejo Rodríguez • Jorge Adolfo Ramírez Uresti • Ariel Ortíz Ramírez • Lucio López Cavazos • Pedro Oscar Pérez Murueta • María del Consuelo Serrato Arias • Alfredo Santana Díaz • Roberto Martínez Román • José Luis Gómez Muñoz • Jesús Cuauhtémoc Téllez Gaytán • Manuel Sotelo Duarte • Jorge Sastré Hernández • Ricardo Mendez Hernandez • Luis Enrique Villagómez Guerrero • Francisco Javier Rojas Correa • András Takács • Oriam Renan De Gyves López • Oscar Antonio Osorio Pérez • Miguel Angel Medina Pérez • Yocanxóchitl Perfecto Avalos • Jesús Arturo Escobedo Cabello • Hector Javier Medel Cobaxin


Contact: [ sanchez.hmsc@berkeley.edu | chipdelmal@gmail.com ]
My main projects: [ MGDrivE & MoNeT ]
My personal website: [ chipdelmal.github.io ]


About

Materials for the "Data Wrangling" CADi workshop @ "Tecnológico de Monterrey"

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 59.6%Language:Python 40.4%