Kaggle Competition - Predict the category of crimes that occurred in the city by the bay (PySpark Project)
1 Insights and Visualizations
- Crime rate in each District (Crime percentage per District)
- Crimes over years (Crime rate per District)
- Weekends vs Weekdays
- Hours peak crime rates (Most Crime Hours/Days)
- Cases with actions (1)
- Cases with actions (2) with details
- Cases danger level
- The Most Dangerous places (in San Francisco) 'Map Based' #Sampled from 1000 Crime
- The Most Dangerous places (in San Francisco) 'Street Based'
2 Machine Learning
- Categorizing String Data
- Creating Feature vector and Normalizing values
- Classfiers #1 logistic regression #2 Decision tree #3 Random forest #4 Naive Bayes #5 One-vs-Rest (LogisticRegression)
- Paramter Tuning (Cross Validation & ParamGridBuilder)
3 Testing
- Preprocessing
- DecisionTreeClassifier with define paramters
- Return String labels