lixinbella1993 / college-data-analysis

college education-data classification logistic-regression lda knn-classification cross-validation

College-data-analysis

This exercise uses the College data set from Intro to statistical learning by

Gareth James • Daniela Witten • Trevor Hastie & Robert Tibshirani

It contains a number of variables for 777 different

universities and colleges in the US.

The variables are

Private : Public/private indicator

• Apps : Number of applications received

• Accept : Number of applicants accepted

• Enroll : Number of new students enrolled

• Top10perc : New students from top 10% of high school class

• Top25perc : New students from top 25% of high school class

• F.Undergrad : Number of full-time undergraduates

• P.Undergrad : Number of part-time undergraduates

• Outstate : Out-of-state tuition

• Room.Board : Room and board costs

• Books : Estimated book costs

• Personal : Estimated personal spending

• PhD : Percent of faculty with Ph.D.’s

• Terminal : Percent of faculty with terminal degree

• S.F.Ratio : Student/faculty ratio

• perc.alumni : Percent of alumni who donate

• Expend : Instructional expenditure per student

• Grad.Rate : Graduation rate

This exercise aims to

1. produce some comparative analysis between private and public colleges: e.g tuition, acceptance and graduation rate, %of instructional expenditure as a ratio of tuition, etc

2. demonstrate how to use statistical methods such as Logistic regression, LDA, QDA, KNN

by using the available data to predict schools being public or private

3. Combined with school rating data(which will be scrapped by python from USNews), the available data is used to predict the ratings. Again different models will be tried and compared to come up with the best one.

About

college education-data classification logistic-regression lda knn-classification cross-validation

Languages

Language:R 100.0%