udemirezen / gap-statistic

An implementation of the gap statistic algorithm to compute the number of clusters in a set of numerical data.

Home Page:http://blog.echen.me/2011/03/19/counting-clusters/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About

An implementation of the gap statistic algorithm from Tibshirani, Walther, and Hastie's "Estimating the number of clusters in a data set via the gap statistic". A description of the algorithm can be found here.

Examples

	# Single cluster in 5 dimensions
	data = cbind(rnorm(20), rnorm(20), rnorm(20), rnorm(20), rnorm(20))

	png("examples/1_cluster_5d_gaps.png")
	gap_statistic(data)
	dev.off()

Single cluster in 5 dimensions

	# Three clusters in 2 dimensions
	x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5))
	y = c(rnorm(20, mean = 0), rnorm(20, mean = 5), rnorm(20, mean = 0))
	data = cbind(x, y)

	png("examples/3_clusters_2d.png")
	qplot(x, y)
	dev.off()

3 clusters in 2 dimensions

	png("examples/3_clusters_2d_gaps.png")
	gap_statistic(data)
	dev.off()

3 clusters in 2 dimensions

	# Four clusters in 3 dimensions
	x = c(rnorm(20, mean = 0), rnorm(20, mean = 3), rnorm(20, mean = 5), rnorm(20, mean = -10))
	y = rnorm(80, mean = 0)
	z = c(rnorm(40, mean = -5), rnorm(40, mean = 0))
	data = cbind(x, y, z)

	png("examples/4_clusters_3d.png")
	scatterplot3d(x, y, z)
	dev.off()

4 clusters in 3 dimensions

	png("examples/4_clusters_3d_gaps.png")
	gap_statistic(data)
	dev.off()

4 clusters in 3 dimensions

About

An implementation of the gap statistic algorithm to compute the number of clusters in a set of numerical data.

http://blog.echen.me/2011/03/19/counting-clusters/


Languages

Language:R 100.0%