an100 / oa_lesson_3_clustering

Lesson 3 in Oracle Academy's Data Science Bootcamp: Unsupervised Learning and Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clustering and Unsupervised Learning

Datasets and example code for Lesson 3 in Oracle Academy's Data Science Bootcamp. Raw data for the Hadoop-based canopy clustering is in cluster_sample.xml. Raw data for clustering in R is in cluster_set_clean.dat. Commands for the Hadoop and Hive clustering example are in canopy_commands.txt. Commands for the R clustering example are in clustering.R.

For this lesson:

  1. Copy cluster_sample.xml into HDFS
  2. Follow the steps in canopy_commands.txt
  • Create a dataset for canopy clustering
  • Create a dataset for k-means clustering in R
  1. Compile and run the canopy clusterer
  2. Load the R dataset into R and experiment with clustering in R

About

Lesson 3 in Oracle Academy's Data Science Bootcamp: Unsupervised Learning and Clustering


Languages

Language:Java 88.7%Language:R 7.9%Language:Shell 3.4%