My learning outcomes and followup of a well instructed Coursera guided project by Ari Anastassiou.
We were provided with taxi rank location data of North American Region and had to solve a problem of defining the key clusters of these taxis where service stations for all taxis operating in that area can be built.
Task 1: Exploratory Data Analysis
Task 2: Visualizing Geographical Data
Task 3: Clustering Strength / Performance Metric
Task 4: K-Means Clustering
Task 5: DBSCAN
Task 6: HDBSCAN
Task 7: Addressing Outliers
- Visulaization
- Machine Learning
- Clustering
- Data Analysis
- Map BuildingVisualizing
Understanding the problem and data provided through basic data analysis and visualizations.
- Checking for duplicate and empty data cells
- Removing the redundant data
- Finally plotting the cleared data
Trying various interactive means to further improve my learnings about the data.
- Plotting the data on the world map with the co-ordinates provided
Evaluating the strength of a clustering algorithm.
- Calculating the silhouette score
- Plotting the graph for various blobs
Gaining the theoretical knowledge about k-means clustering algorithm and implementing it for our data.
- Visualizing the K-means on sample data
- Calculating the best silhouette score for our data
- Plotting the data on the basis of the algorithm
Gaining theoretical and practical knowledge of Density-Based Spatial Clustering of Applications with Noise(DBSCAN).
- Calculating the best silhouette score for our data
- Plotting the data on the map for density based approach
Gaining theoretical and practical knowledge of Hierarchical DBSCAN or HDBSCAN to alleviate constraints of classical DBSCAN.
- Calculating the best silhouette score for our data
- Plotting the data on the map for density based approach
Addressing outliers classified by various density-based models
- Using K-neighbour classifier and calculating its silhouette score
- Comparing Hybrid and K-Means Approaches
After completing this project I am able to do basic data manipulations required for any data processing field throughly and through various visual means. Further I got a more deep insight on how various clustering algorithms differ from each other and how I can evaluate their strength on basis of various data. Lastly this project provided a good insight to how some real world problems can be solved using these means.