A guide to some of the most useful R Packages that we know about, organized by their role in data science.
Click here to suggest packages.
Each data science project is different, but each follows the same general steps. You:
-
Import your data into R
-
Tidy it
-
Understand your data by iteratively
- visualizing
- tranforming and
- modeling your data
-
Infer how your understanding applies to other data sets (including future data, i.e. predictions)
-
Communicate your results to an audience, or
-
Automate your analysis for easy reuse
-
Program the whole way through, since you do each of these things on a computer
Below we list the most useful R packages that we know of for each step.
These packages help you import data into R and save data.
These packages help you wrangle your data into a form that is easy to analyze in R.
These packages help you visualize your data.
- ggplot2 with extensions
- lattice
- rgl
- extrafont
- ggvis
- ggstat
- gggeom
- manipulate
- htmlwidgets
- rCharts
- quantmod
These packages help you transform your data into new types of data.
These packages help you build models and make inferences. Often the same packages will focus on both topics.
- stats
- mgcv
- lme4
- broom
- caret
- glmnet
- mosaic
- gbm
- xgboost
- randomForest
- ranger
- h2o
- kernlab
- nlme
- ROCR
- pROC
These packages help you communicate the results of data science to your audiences.
These packages help you create data science products that automate your analyses.
- shiny
- rsconnect
- plumber
- countdown
- rstudioapi
These packages make it easier to program with the R language.
- RStudio Desktop IDE
- RStudio Server Open Source
- RStudio Server Professional
- devtools
- magrittr
- packrat
- testthat
- roxygen2
- purrr
- profvis
- rcpp
- R6
- htmltools
- snow
- Rth
- MKL by Microsoft/Revolution Analytics
- MRS by Microsoft/Revolution Analytics
These packages contain data sets to use as training data or toy examples.
- datasets
- babynames
- neiss
- yrbss
- nycflights13
- hflights
- USAboundaries
- rworldmap
- usdanutrients
- fueleconomy
- nasaweather
- mexico-mortality
- data-movies
- pop-flows
- data-housing-crisis
- gun-sales
- stationaRy
- ggenealogy
What makes an R Package useful? A useful R package should perform a useful task, and it should do it well. Here are some criteria that we used to make the list.
- The code in the package runs fast, with few errors.
- The code in the package has an intuitive syntax that is easy to remember.
- The package plays well with other packages; you do not need to munge your data into new forms to use the package.
- The package is widely used and recommended by its users.
- The package has a development website, or series of vignettes, that make the package easy to learn.
You can learn more about packages in R with the CRAN task views.