gastonstat / stat33b

Introduction to Advanced Programming in R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STAT 33B - Introduction to Advanced Programming in R

  • Description: The course is designed primarily for those who are already familiar with programming in another language (e.g. Python, C, Java), and want to understand how R works, and for those who already know the basics of R programming and want to gain a more in-depth understanding of the language in order to improve their coding. The focus is on the underlying paradigms in R, such as atomic and non-atomic vectors, functional programming, environments, and object systems (time permitting). The goal of this course is to better understand programming principles in R and to write better R code that capitalizes on the language's design. Topics include (not necessarily covered in the following order):

    • Data types and data structures in R (e.g. vectors, arrays, lists, data frames)
    • πŸ”¨ Tools for data manipulation
    • πŸ“Š Tools for data visualization
    • πŸ“₯ Data input/output
    • πŸ”€ Control flow structures (e.g. conditionals, iterations)
    • πŸ“ Writing simple functions
    • πŸ“’ Function calls
    • πŸ“‘ Argument matching
    • βž• The formula language of R (time permitting)
    • 🎁 Building simple R packages
  • Instructor: Gaston Sanchez

  • Lecture: 1 hour of lecture per week

  • Lab: 1 hour of laboratory per week

  • Assignments: biweekly HW assignments

  • Exams: one midterm exam, and final test

  • Texts and Notes:

  • LMS: the specific learning resources of a given semester are shared in the Learning Management Sysment (LMS) approved by Campus authorities (e.g. bCourses, Canvas)

  • Policies:


1. Introduction

πŸ“‡ ABOUT:

We begin with the usual review of the course policies, logistics, overall expectations, topics in a nutshell, etc. At the computational level, you'll get introduced to RStudio and R, as well as Markdown and its use in dynamic computational documents (e.g. Rmd and qmd files).


πŸ“– READING:


✏️ TOPICS:

  • Introduction
    • About the course
    • First contact with R and RStudio
    • Markdown syntax

2. Data Types and Vectors

πŸ“‡ ABOUT:

In this week we describe data types and their implementation in R vectors (the most fundamental data object in R).


πŸ“– READING:


✏️ TOPICS:

  • Data Types
    • atomic types (e.g. logical, integer, double, character)
    • coercion
    • vectorization
    • recycling
    • subsetting

3. Other Atomic Objects

πŸ“‡ ABOUT:

We continue describing more atomic objects such as arrays (N-dimensional objects) and matrices (2-dimensional arrays).


πŸ“– READING:


✏️ TOPICS:

  • More atomic objects
    • Creation of simple matrices with matrix()
    • How R internally stores matrices
    • Why matrices are atomic objects
    • In what sense a matrix is a 2-dimensional object
    • Matrix subsetting (subscripting, indexing)

4. Non-atomic Objects

πŸ“‡ ABOUT:

In this week, We continue describing non-atomic objects such as list and data-frames, and we also review how to manipulate (subset) these kind of objects.


πŸ“– READING:


✏️ TOPICS:

  • Lists
    • Manipulation of lists and data frames

5. Importing and Exporting Resources

πŸ“‡ ABOUT:

In this week, we'll talk about concepts and functions that have to do with so-called input(s)-output(s), or simply put, with importing and exporting operations. For example:

  • how to import a data table
  • how to export a data table
  • how to export a graphic to an image file
  • how to export function outputs to external files

πŸ“– READING:


✏️ TOPICS:

  • Imports/Exports
    • read.table() and derived functions: e.g. read.csv(), read.delim()
    • Importing text with readLines()
    • Importing code with soruce()
    • Exporting output to external files with sink()
    • Exporting images with png(), jpeg(), pdf(), etc
    • Mechanism used by R for reading-in data

6. First Contact with tidyverse tools (part 1)

πŸ“‡ ABOUT:

In addition to learning about the "classic" way to work with data frames, we will briefly touch on an "alternative" approach for working with tables provided by the tidy data framework and the ecosystem of packages known as the "tidyverse": https://www.tidyverse.org

This week we start with the tidyverse package "dplyr". Simply put, "dplyr" comes with functions to manipulate data-tables (e.g. data-frames, and other 2-dimensional objects) using a modern and syntactic way.


πŸ“– READING:


✏️ TOPICS:

  • dplyr verbs
    • slice()
    • filter()
    • select()
    • arrange()
    • group_by()
    • summarise()

7. First Contact with tidyverse tools (part 2)

πŸ“‡ ABOUT:

Last week we discussed the basics of "dplyr". This week we move on to "ggplot2" which is another tidyverse package that allows you to create nice graphics, also following the tidy data framework.


πŸ“– READING:


✏️ TOPICS:

  • ggplot verbs
    • the grammar of graphics
    • geometric objects and visual attributes
    • building a graphic with layers
    • supporting graphical elements

8. Conditional "if-else" statements

πŸ“‡ ABOUT:

This week we introduce the notion of R expressions, and we provide the syntax used by R to handle if-else statements and related conditionals constructs.


πŸ“– READING:


✏️ TOPICS:

  • If-else statements
    • R compound expressions and the use of curly braces { ... }
    • Anatomy of an if-else statement in R
    • Vectorized if-else function ifelse()
    • switch() construct

9. Iterations

πŸ“‡ ABOUT:

This week we introduce the syntax used by R to handle iteration constructs such as for() loops, while() loops, repeat loops, and the apply family functions.


πŸ“– READING:


✏️ TOPICS:

  • Loops
    • Anatomy of a for() loop in R
    • Anatomy of a while() loop in R
    • Anatomy of a repeat loop in R
    • break statement to stop a loop
    • next statement to skip an iteration
    • apply() family functions

10. Functions

πŸ“‡ ABOUT:

This week we review the syntax used by R for writing functions. We also take a look at auxiliary functions such as return(), stop(), and warning()


πŸ“– READING:


✏️ TOPICS:

  • Functions
    • Main parts of a function (i.e. anatomy of a function)
    • Examples for creating a function
    • Difference between positional arguments, and named arguments
    • Binary opeartor functions

11. Functions and Scoping

πŸ“‡ ABOUT:

This week we review more technical aspects of functions in R. Specifically, we will focus on the scoping mechanisms used by R to find the value of a variable.


πŸ“– READING:

  • Slides

✏️ TOPICS:

  • Environments
    • What is an environment?
    • Creating environments
    • Types of environments
    • The search list
  • Scoping principles
    • Name maksing
    • Functions vs Variables
    • Fresh start
    • Dynamic lookup

12. Performance and Profiling

πŸ“‡ ABOUT:

This week we review more underlying principles that have to do with performance in R.


πŸ“– READING:

  • Slides

✏️ TOPICS:

  • R's behavior
    • R's motto
    • Copy-on-modify policy
    • What things make R slow
  • Performance
    • Measuring performance in a "quick-and-dirty" way with system.time()
    • Profiling code with Rprof() and "profvis"
    • Alternative way to measure performance with "microbenchmark"

13. Packaging (part 1)

πŸ“‡ ABOUT:

This week we review the anatomy of an R package.


πŸ“– READING:

  • Slides

✏️ TOPICS:

  • Anatomy of an R package
    • DESCRIPTION file
    • NAMESPACE file
    • R/ folder
    • man/ folder
    • Roxygen comments and Rd files

14. Packaging (part 2)

πŸ“‡ ABOUT:

This week we go through the first steps for creating a simple R package.


πŸ“– READING:

  • Slides

✏️ TOPICS:

  • Building an R package
    • devtools functions
    • Create Rd files with document()
    • Check content of Rd files with check_man()
    • Build a bundle with build()
    • Install a package locally with install()

About

Introduction to Advanced Programming in R