u-u-h / seriation

Infrastructure for Ordering using Seriation - R Package

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

seriation - Infrastructure for Ordering Objects Using Seriation - R package

CRAN version CRAN RStudio mirror downloads Travis-CI Build Status AppVeyor Build Status

This package provides the infrastructure for seriation with an implementation of several seriation/sequencing techniques to reorder matrices, dissimilarity matrices, and dendrograms (see below for a full list). Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

Installation

  • Stable CRAN version: install from within R.
  • Current development version: Download package from AppVeyor or install via install_github("mhahsler/seriation") (requires R package devtools)

Example

## load library and read data
R> library(seriation)
R> data("iris")
R> x <- as.matrix(iris[-5])
R> x <- x[sample(1:nrow(x)),]

## calculate distances and use default seriation
R> d <- dist(x)
R> order <- seriate(d)
R> order
object of classser_permutation’, ‘listcontains permutation vectors for 1-mode data

  vector length seriation method
1           150             ARSA

## compare quality
R> rbind(
+ random = criterion(d),
+ reordered = criterion(d, order)
+ )
          AR_events AR_deviations       RGAR Gradient_raw Gradient_weighted Path_length
random       550620    948833.712 0.49938328          741         -1759.954   392.77766
reordered     54846      9426.094 0.04974243       992214       1772123.418    83.95758
            Inertia Least_squares       ME Moore_stress Neumann_stress     2SUM      LS
random    214602194      78852819 291618.0    927570.00     461133.357 29954845 5669489
reordered 356945979      76487641 402332.1     13593.32       5274.093 17810802 4486900

Available Seriation Methods

For dissimilarity data:

  • Branch-and-bound to minimize the unweighted/weighted column gradient
  • DendSer - Dendrogram seriation heuristic to optimize various criteria
  • GA - Genetic algorithm with warm start to optimize various criteria
  • HC - Hierarchical clustering (single link, avg. link, complete link)
  • GW - Hierarchical clustering reordered by Gruvaeus and Wainer heuristic
  • OLO - Hierarchical clustering with optimal leaf ordering
  • Identity permutation
  • MDS - Multidimensional scaling (metric, non-metric, angle)
  • SA - Simulated annealing to minimize anti-Robinson events
  • TSP - Traveling sales person solver to minimize Hamiltonian path length
  • R2E - Rank-two ellipse seriation
  • Random permutation
  • Spectral seriation (unnormalized, normalized)
  • SPIN - Sorting points into neighborhoods (neighborhood algorithm, side-to-site algorithm)
  • VAT - Visual assessment of clustering tendency ordering
  • QAP - Quadratic assignment problem heuristic (2-SUM, linear seriation, inertia, banded anti-Robinson form)

For matrices:

  • BEA - Bond Energy Algorithm to maximize the measure of effectiveness (ME)
  • Identity permutation
  • PCA - First principal component or angle on the projection on the first two principal components
  • Random permutation
  • TSP - Traveling sales person solver to maximize ME

Further Information

Maintainer: Michael Hahsler

About

Infrastructure for Ordering using Seriation - R Package


Languages

Language:R 63.9%Language:Fortran 19.4%Language:C 16.7%