Anirban166 / binsegRcpp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

binsegRcpp Efficient implementation of the binary segmentation heuristic algorithm for changepoint detection, using C++ std::multiset. Also contains functions for comparing empirical time complexity to best/worst case.

testshttps://github.com/tdhock/binsegRcpp/workflows/R-CMD-check/badge.svg
coveragehttps://codecov.io/gh/tdhock/binsegRcpp/branch/master/graph/badge.svg

Installation

install.packages("binsegRcpp")
##OR
if(require("remotes"))install.packages("remotes")
remotes::install_github("tdhock/binsegRcpp")

Usage

The main function is binseg for which you must at least specify the first two arguments:

  • distribution.str specifies the loss function to minimize.
  • data.vec is a numeric vector of data to segment.
 > x <- c(0.1, 0, 1, 1.1, 0.1, 0)
 > (models.dt <- binsegRcpp::binseg("mean_norm", x))
 binary segmentation model:
    segments   end          loss validation.loss
	<int> <int>         <num>           <num>
 1:        1     6  1.348333e+00               0
 2:        2     4  1.015000e+00               0
 3:        3     2  1.500000e-02               0
 4:        4     3  1.000000e-02               0
 5:        5     5  5.000000e-03               0
 6:        6     1 -3.339343e-16               0

The result above summarizes the data that are computed during the binary segmentation algorithm. It has a special class with dedicated methods:

> class(models.dt)
[1] "binsegRcpp" "list"      
> methods(class="binsegRcpp")
[1] coef  plot  print
see '?methods' for accessing help and source code

The coef methods returns a data table of segment means:

 > coef(models.dt, segments=2:3)
    segments start   end start.pos end.pos  mean
	<int> <int> <int>     <num>   <num> <num>
 1:        2     1     4       0.5     4.5  0.55
 2:        2     5     6       4.5     6.5  0.05
 3:        3     1     2       0.5     2.5  0.05
 4:        3     3     4       2.5     4.5  1.05
 5:        3     5     6       4.5     6.5  0.05

Demo of poisson loss and non-uniform weights:

> data.vec <- c(3,4,10,20)
> (fit1 <- binsegRcpp::binseg("poisson", data.vec, weight.vec=c(1,1,1,10)))
binary segmentation model:
   segments   end      loss validation.loss
      <int> <int>     <num>           <num>
1:        1     4 -393.8437               0
2:        2     3 -411.6347               0
3:        3     2 -413.9416               0
4:        4     1 -414.0133               0

Demo of change in mean and variance for normal distribution:

> sim <- function(mu,sigma)rnorm(10000,mu,sigma)
> set.seed(1)
> data.vec <- c(sim(5,1), sim(0, 5))
> fit <- binsegRcpp::binseg("meanvar_norm", data.vec)
> coef(fit, 2L)
   segments start   end start.pos end.pos        mean       var
      <int> <int> <int>     <num>   <num>       <num>     <num>
1:        2     1 10000       0.5 10000.5  4.99346296  1.024763
2:        2 10001 20000   10000.5 20000.5 -0.02095033 24.538556

Related work

Other implementations of binary segmentation include changepoint::cpt.mean(method=”BinSeg”) (quadratic storage in max number of segments), BinSeg::BinSegModel() (same linear storage as binsegRcpp), and ruptures.Binseg() (unknown storage). Figures comparing the timings.

This version uses the Rcpp/.Call interface whereas the binseg package uses the .C interface.

See branches for variations of the interface to use as test cases in RcppDeepState development.

About


Languages

Language:R 49.9%Language:C++ 49.2%Language:Shell 0.7%Language:Emacs Lisp 0.1%Language:C 0.1%