Notes on using R for statistics
- Basics: basic commands and utilities
- Packages: usefull packages and how to use them
- Linear Regression: How to do a linear regression
- Logistic Regression: How to do logistic regression
- Regression Trees: how to do a regression tree
- Natural Language Processing: analysing text
- Clustering: cluster analysis
Good for predicting a continous outcome, numeric value. Simple, and works on small and large datasets. Assumes a linear relationship.
Good for predicting a binary outcome, two categorical value. Creates probabilities on the outcome. Assumes a linear relationship.
Good for prediction an outcome, or continous outcome. Can handle datasets without a linear relationship and is easy to explain. Small datasets may not work.
Similar to CART but can improve accuracy. Needs more setup and not as easy to interpret.
Good for finding similar groups of data. No need to know how many clusters you need, easy to visulise. Difficult to use on large datasets.
Similar to hierachial. Need to know number of clusters beforhand.