bobbydyr / Spark-Databricks-SFcrime-Analysis

Spark, Spark SQL, DataFrame, Data Clean, Visualization, Clustering, Time Series.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

San Francisco crime data analysis and modeling

  • Spark, Spark SQL, DataFrame, Data Clean, Visualization, Clustering, Time Series.

Link -> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/223283973976222/947956520278877/6166950463045644/latest.html

Goal:

  • With a background of people care about their safety more than ever. I am trying to understand crime happened in San Francisco in the past 16 years with more than 2 million instances.
  1. Try to understand crime respect to time, area in San Francisco.
  2. Try to give suggestion to police department on how to distribute police.
  3. Try to give policy makers suggestions and help with the policy making process.

Executive Summary:

  1. Souther, Mission, Nothern are three most dangerous distric. Should distribute more police.
  2. More crime in January, March, October, and generaly more crime in summer. Visitors should look out.
  3. In a day, noon, evening, and midnight have more crimes than usual. And generally more crime in the afternoon. Police should look out for these time periods.
  4. Crime such as sex offense has a really clear upward trend and the resolution rate is decreasing. And policy makers should really look out for new policys to increase resolution rate and think about why there are more and more sex offense.
  5. Crime such as drug has a clear decline trend and it is a really good sign. Might because of the new policy respect to drug.
  6. There are more and more theft crimes, but less and less crimes been resloved by court. Policy makers should really think about what went wrong.
  7. Detailed crime category analysis in step 6 and 7.

Steps:

  • Step 0: Load and register DataFrame
  • Step 1: Visualize total number of crime for each category
  • Step 2: Visualize total number of crime for each district
  • Step 3: Analysis for downtown crime over the years.
  • Step 4: Visualize total crime for each month.
  • Step 5: Compare 3 specific day.
  • Step 6: Deeper Analysis for each category of crime.
  • Step 7: Analysis of resolution rate over time for each category of crime.
  • Step 8: Simple spatial analysis with Kmeans clustering.
  • Step 9: Time Series: ARIMA prediction for the future month.

About

Spark, Spark SQL, DataFrame, Data Clean, Visualization, Clustering, Time Series.

License:MIT License