khaledxmust / San-Francisco-Crime-Classification

Kaggle Competition - Predict the category of crimes that occurred in the city by the bay

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

San-Francisco-Crime-Classification

Kaggle Competition - Predict the category of crimes that occurred in the city by the bay (PySpark Project)

alt text

1 Insights and Visualizations

  1. Crime rate in each District (Crime percentage per District)
  2. Crimes over years (Crime rate per District)
  3. Weekends vs Weekdays
  4. Hours peak crime rates (Most Crime Hours/Days)
  5. Cases with actions (1)
  6. Cases with actions (2) with details
  7. Cases danger level
  8. The Most Dangerous places (in San Francisco) 'Map Based' #Sampled from 1000 Crime
  9. The Most Dangerous places (in San Francisco) 'Street Based'

2 Machine Learning

  1. Categorizing String Data
  2. Creating Feature vector and Normalizing values
  3. Classfiers #1 logistic regression #2 Decision tree #3 Random forest #4 Naive Bayes #5 One-vs-Rest (LogisticRegression)
  4. Paramter Tuning (Cross Validation & ParamGridBuilder)

3 Testing

  1. Preprocessing
  2. DecisionTreeClassifier with define paramters
  3. Return String labels

About

Kaggle Competition - Predict the category of crimes that occurred in the city by the bay

License:MIT License


Languages

Language:HTML 67.8%Language:Jupyter Notebook 32.2%