zampolli75 / ISDS3105_fall18

This is the material for ISDS3105

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Required Text: Wickham, H., & Grolemund, G. (2016). R for data science

Required Softwares: R, RStudio Desktop, Git (Windows | Mac)

Sign up to the Slack channel

Disclaiemer: Why this course has little to do with web development

This course used to be a web development online class that the ISDS department changed into an introduction to R. Because changing a course name is a rather laborious process we kept the old name. If you are looking for the old web development class, please register to the online section of ISDS3105.

Introduction

Since its first release in 1995, R functionalities have been extended well beyond those of a statistical software, leading it to become one of the most popular software environments for data analysis. This course will prepare students to manage a data analysis project using R and its Integrated Development Environment (IDE) RStudio. Students will gain familiarity with the most popular R libraries to streamline the data analysis workflow: data gathering and retrieval, dataset wrangling and manipulation, and effective presentation of the results. In particular, our focus will be on developing applications for online interactive reporting (e.g., dashboards and interactive reports).

Course Objectives

Upon successful completion of this course you will be able to:

  1. Effective project management using IDE resources (projects) and version control (Git/GitHub)
  2. Understand the fundamentals of R programming
  3. Web scraping (using rvest)
  4. Dataset wrangling and manipulation (using dplyr/tidyr)
  5. Chart design and creation for data visualization (using ggplot2)
  6. Efficient and effective results presentation using dynamic and interactive reporting techniques (RMardkown)
  7. Querying remote relational databases (MySQL) and other source systems (using dplyr)
  8. T-test and regression (using ggplot2, plotly, infer)

Expectations

The course is designed for beginners and we expect no prior knowledge of R nor of any other programming language. However, the class builds incrementally on prior content presented in class, thus we recommend that you come "ready to play" from day 1. Prior knowledge of database, web programming, basic stats or other programming languages are a plus but are not critical to succeed in the course.

Learning R is like a contact sport – the more you practice, the better you become at it. Attendance is a good way to push yourself to code every week, and I strongly encourage it although it is not compulsory: Besides few glorious exceptions, there is generally a strong correlation bewteen low attendance and low performance.

Evaluation

We adopt the standard LSU +/- grading scale without any forced curve. The breakdown of the final grade is:

Mid term 30%: The first exam will focus on RMarkdown, data visualization (ggplot2), dataset normalization (tidyr), and data manipulation using dplyr (calculating descriptive statistics).

Group project 20%: The group project is a data analysis project to assess your ability to import, transform, manipulate, visualize, and analyze data. The final output will be a report (interactive or static) to enhance the understanding of a research question.

Final Exam 25%: The final exam will be comprehensive and will focus on both practical skills and theoretical aspects.

Assignments 20%: Students are required to submit assignments approximately (no less than four). Assignments will strictly cover topics discussed in class, and are crucial to interiorize the material. Assignments must be uploaded to your private GitHub repository, and a link to the file must be submitted via Moodle.

Professionalism 5%: We will maintain a high standard of professionalism at all times. Beyond the obvious, such as not disrupting the class by being late, navigating the Web with your computer during class, talking to the people nearby, etc. – professional conduct includes the ability to be a value adding contributor to our learning community.

Calendar

Date Topic Assignment Due Readings
Tuesday, August 21 Introduction and set-up
Thursday, August 23 Git/GitHub Assignment1 (install RStudio, git/github) Do you have a moment to talk about version control?
Tuesday, August 28 Base R - data structures 1
Thursday, August 30 Base R - functions
Tuesday, September 4 Base R - data structures 2 Assignment2
Thursday, September 6 DataViz ggplot2
Tuesday, September 11 DataViz ggplot2
Thursday, September 13 Tidy data
Tuesday, September 18 dplyr
Thursday, September 20 dplyr - connecting and querying DB Assignment3
Tuesday, September 25 dplyr - connecting and querying DB
Thursday, September 27 Mid-term
Tuesday, October 2 Geospatial Viz
Thursday, October 4 Fall Holiday
Tuesday, October 9 Manipulating Dates
Thursday, October 11 Open Data API Assignment4 GeoViz
Tuesday, October 16 OpenData
Thursday, October 18 Iteration with purrr
Tuesday, October 23 WebScraping
Thursday, October 25 Regression Assignment5 Assingnment on OpenData Map
Tuesday, October 30 Parametrized Reports
Thursday, November 1 Dashoboards with flexdashboard
Tuesday, November 6
Thursday, November 8 Dashoboards with flexdashboard
Tuesday, November 13 Supervised lab - group project
Thursday, November 15 Supervised lab - group project
Tuesday, November 20 Case-study - online reviews Assignment6
Thursday, November 22 Thanksgiving
Tuesday, November 27 Presentations
Thursday, November 29 Presentations
Wednesday, December 5 Final exam 5.30-7.30 PM

Other interesting readings

ModernDiver: An Introduction to Statistical and Data Sciences via R

R in Action

Beyond SpreadSheets with R

The Art of R Programming

About

This is the material for ISDS3105


Languages

Language:HTML 99.3%Language:R 0.7%Language:CSS 0.0%