fahimabrar / Predicting_baseball_player-s_salary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predicting Baseball Player's Salary

This project is a University Coursework

  • The data of baseball players here
  • The project has following subsection:
    • Data quality analysis
    • Data Cleaning
    • Exploratory Data Analysis
    • Building model for players salary (linear regression is used)
    • Predicting if a player will hit/not hit (Logistic regression is used)
    • References

Language

  • R

Package Used

  • ggplot2 for data visualization
  • dplyr for data manipulation
  • stringr for string manipulation
  • nanier to see the null value in a nice graph
  • validate for data quality checking
  • gridExtra for plotting in grid
  • tidyr for plotting multiple histogram in single plot
  • purr

Issues Found

  • heteroskadacity
  • multicollinarity

Getting help for Issue solving

  1. https://www.theanalysisfactor.com/outliers-to-drop-or-not-to-drop/#:~:text=You%20may%20run%20the%20analysis,any%20significance%20from%20your%20analysis.

  2. https://cooldata.wordpress.com/2010/03/04/why-transform-the-dependent-variable/

  3. https://statisticsbyjim.com/regression/heteroscedasticity-regression/

  4. https://stackoverflow.com/questions/40572124/plot-lm-error-operator-is-invalid-for-atomic-vectors

  5. https://www.researchgate.net/post/Help_with_Logistic_Regression_In_rglmfit_fitted_probabilities_numerically_0_or_1_occurred_glmfit_algorithm_did_not_converge

  6. Senaviratna, N.A.M.R. and Cooray, T.M.J.A., 2019. Diagnosing Multicollinearity of Logistic Regression Model. Asian Journal of Probability and Statistics, pp.1-9.

About


Languages

Language:HTML 100.0%