jomayer / SMuRF

The SMuRFS algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SMuRFS

The SMuRFS algorithm

Date: 1/25/2017

Authors: Joshua Mayer, Raziur Rahman, Souparno Ghosh, Randip Pal

Platform: R Version 3.3.2

Required packages: partykit, Formula, strucchange, matrixStats, coin

Maintainer: Joshua Mayer joshua.mayer@ttu.edu

Description: Sequential removal of insignificant features.

Usage

SMuRFS(formula, data, ntree = 500, mtry, alpha = 0.05, prop.test = .632, response.position)

Inputs

Formula: A object of class formula. This formula will give the inherent regression equation.

data: A object of class data frame. Names in the data frame must match the names in the formula. Missing data are removed.

ntree: An integer greater than or equal to 1. The number of trees grown for the SMuRFS algorithm. Default is 500.

mtry: An integer greater than or equal to 1. The number of variables sampled for each tree.

alpha: An number between 0 and 1. The significance level declared for feature removal.

prop.test: A number between 0 and 1. The size of the test set for the secondary test, as a proportion of the data. Default is 0.632.

response.position: The column of which the responses are located. It could be done automatically with the Formula package, but this breaks down in high dimensions.

Details

The following is the function to run the Sequential Multi Response Feature Selection (SMURFS). The function selects a subset of features of size mtry and a bootstrap sample of size n , grows a tree from those features and that bootstrap sample using the conditional inference framework (Hothornet et al. , 2006), then selects the features that are significant at any node of the tree. Features that are not selected are tested on a test set that is a subset of the data. Features that fail the second test are removed from consideration. After ntree iterations the features that survive are the selected features.

Value

A list of survived covariates.

Examples

library(MASS)
library(Matrix)
set.seed(100)
beta <- c(runif(50,1,3), rep(0,950))  
sigma.y <- matrix(c(1,0.7,0.7,0.7,1,0.7,0.7,0.7,1), nrow = 3,  byrow = F)
omega <- function(n)
{
my.mat <- matrix(0.7, n, n)
diag(my.mat) <- rep(1,n)
return(my.mat)
}
sigma.x <- bdiag(omega(50), diag(1,950))
set.seed(100)    
xx <- mvrnorm(200, rep(0,1000), sigma.x)
means <- xx %*% beta
set.seed(100)
yy <- t(sapply(1:200, function(i) mvrnorm(n=1, mu = rep(means[i,],3), Sigma = sigma.y)))
dat <- as.data.frame(cbind(xx,yy))
set.seed(100)
var.select <- SMuRFS(formula = V1001 + V1002 + V1003 ~., data = dat, ntree = 500, mtry = 8,
alpha = 0.05, prop.test = .632, response.position = c(1001,1002,1003))

################################################################ ################################################################

About

The SMuRFS algorithm