corytu / r-notes

My personal notes of R language

Repository from Github https://github.comcorytu/r-notesRepository from Github https://github.comcorytu/r-notes

R Notes

General articles

Getting help

  • ? or help gives the documentation of a specific function.
  • ?? or help.search searches for provided key word (or regex pattern) in the help system.

Logic

  • xor indicates elementwise exclusive OR.

Files management

  • setwd & getwd: Setting working directory
  • list.files & list.dirs: List all files or directories in the given path.
  • file.exists, file.copy, file.rename, file.remove: System level of file manipulation.
  • download.file: Download files from the Internet in an R session.

Data cleaning and manipulation

  • anyNA, complete.cases, is.na, and na.omit are useful when finding or excluding NAs.
  • order can order the data frame with data in its column(s). For example, airquality[order(airquality$Month),] and airquality[order(airquality$Day),] order that data frame by Month and Day respectively. Multiple argumets in order are allowed.
  • transform transforms columns in a data frame.

Data visualization

R programming

  • All arguments after an ellipsis must have default values.
  • The arguments can be passed by order or by specified names. When specifying names, they can be either names themselves or characters. For instances, mean(x = 1:3) is equivalent to mean("x" = 1:3).
  • return returns the result of an expression and ignores all the following lines in that function.
  • Generating messages for function users:
    • message is used for generating a diagnostic message
    • warning and stop are for generating warnings and fetal errors respectively.
    • stopifnot, is "If any of the expressions in ... are not all TRUE, stop is called, producing an error message indicating the first of the elements of ... which were not true."
  • missing can be used to test whether a value was specified as an argument to a function. For instance, test <- function(y = 1) {if (missing(y)) {print(y)}}.
  • on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error).
  • exist can test whether the named object exist in the specified environment.
  • readline reads a line from the terminal (in interactive use).
  • :: to use functions (once) without loading the package For example, calling reshape2::melt is equivalent to library(reshape2) or require(reshape2) before melt.

Word processing

  • R的字串處理
  • grep, grepl, regexpr, gregexpr and regexec search for matches to argument pattern within each element of a character vector: they differ in the format of and amount of detail in the results.
  • sub and gsub perform replacement of the first and all matches respectively.
  • sprintf returns a character vector containing a formatted combination of text and variable values.
  • substr extracts or replaces substrings in a character vector.
  • strsplit splits the elements of a character vector x into substrings according to the matches to substring split within them.
  • tolower and toupper convert upper-case characters in a character vector to lower-case, or vice versa. Non-alphabetic characters are left unchanged.
  • nchar takes a character vector as an argument and returns a vector whose elements contain the sizes of the corresponding elements of x.

Functions do loops or parallel operations

  • split divides the data in the vector x into the groups defined by f.
  • apply, sapply, lapply, tapply, and mapply ("apply" family). See an example of mapply since it's more complicated.
  • by is an object-oriented wrapper for tapply applied to data frames.
  • Reduce uses a binary function to successively combine the elements of a given vector and a possibly given initial value.
  • do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it, while call only constructs the function call.
    to_bind <- list(data.frame(A = 1:2, B = 3:4), data.frame(A = 7:9, B = 5:7))
    do.call(rbind, to_bind)
    #   A B
    # 1 1 3
    # 2 2 4
    # 3 7 5
    # 4 8 6
    # 5 9 7
  • replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).
  • Vectorize creates a function wrapper that vectorizes the action of its argument FUN.

Other useful functions

  • class returns the data type (or to be specific, the method) of one object. Compare this with mode.
  • str compactly displays the internal structure of an R object
  • append adds elements to a vector.
  • diff returns suitably lagged and iterated differences, e.g. diff(1:5).
  • identical tests two objects for being exactly equal.
  • system.time returns CPU (and other) times that expr used. Compare this with Sys.time.
  • unlist simplifies it to produce a vector which contains all the atomic components which occur in the given list.
  • unname removes the names or dimnames attribute of an R object.
  • search gives a list of attached packages (see library), and R objects, usually data frames.
  • rle computes the lengths and values of runs of equal values in a vector.
  • sequence can be regarded as the vectorized version of seq_len.
    x <- c(rep(1:4, times = 1:4), 1, 1)
    sequence(rle(x)$length)
    # 1 1 2 1 2 3 1 2 3 4 1 2

Packages

  • car
    Short for "Companion to Applied Regression". Two of the useful functions are Anova and Manova, which can calculate type-II or type-III ANOVA and MANOVA respectively.
  • caret
    Short for "Classification And REgression Training". A package integrate multiple machine learning algorithm packages. In addition, it helps data preprocessing and cross-validation with confusionMatrix.
  • cowplot
    Merging multiple ggplots and labeling them respectively in one graph.
  • dendextend
    Extended functions for built-in dendrograms in R.
  • dplyr
    Some other ways to manipulate or cleanse data.
  • e1071
    LIBSVM package for R.
  • ggmap
    Spatial visualization with ggplot2.
  • ggplot2
    A popular plotting system in R.
  • googleVis
    R interface to Google's chart tools, allowing users to create interactive charts based on data frames.
  • gridExtra
    "Miscellaneous Functions for 'Grid' Graphics." A tutorial can be found here.
  • leaflet
    Useful for adding markers and (interactive) polygons on the map.
  • lme4
    Package for creating (generalized) linear mixed-effects model. Also see regression on repeated measurements for discussions on this topic.
  • magrittr
    The "pipe-like" operator %>% allows people to transmit a value or object to an expression or function call.
  • mice
    Short for "Multivariate Imputation by Chained Equations". Tutorials of means, including but not limit to MICE, to deal with missing data can be found in this webpage (in Mandarin). Check also my understanding to MICE and Tutorial on 5 Powerful R Packages used for imputing missing values.
  • MCMCglmm
    A package for fitting Bayesian mixed models in R. More introduction and tutorial here.
  • plotly
    A powerful package to build interactive plots. Its plot_ly function creates various types of plots, and ggplotly turns most of ggplot2 objects interactive.
  • rattle
    Wonderful GUI for machine learning analyses. The author emphasizes its capability of creating logs when users click the GUI, and exporting them as a shortcut for further argument tuning. Programming is still encouraged.
  • reshape2
    melt the data into a long-format or cast it into a wide-format. An example is provided here.
  • shiny
    Building interactive interface and present data to others even they don't know R. Its tutorial is very worth reading.

About

My personal notes of R language


Languages

Language:R 100.0%