Beliavsky/StatLib

Unofficial mirror of http://lib.stat.cmu.edu/apstat/

StatLib---Applied Statistics algorithms

The Royal Statistical Society has been publishing algorithms in its journal Applied Statistics since 1968. As of 1989, there are over 250 of them. Most are in Fortran, though a few were in Algol, and some recent ones have been in Pascal. The book - Applied Statistics Algorithms by Griffiths, P. and Hill, I.D., Ellis Horwood: Chichester (1985) contains translations of several algorithms from Algol to Fortran. A few of the other algorithms have been supplied in Fortran translations, though a few are in Algol or Pascal. The index which follows indicates which algorithms are not in Fortran.

The full source code is published in the journal. Those available here have been transcribed manually, mainly within CSIRO Division of Mathematics & Statistics, or have been supplied directly by the authors or by the RSS Algorithms Editor. In some cases, later corrections or improvements published in Applied Statistics, have been incorporated.

It is the policy of the editors of the algorithms section of Applied Statistics that algorithms do not use double precision. Many of the algorithms here have been converted to double precision, though users should be careful to check what precision is used. Many of the algorithms require the use of other algorithms, particularly functions for the gamma function and the normal distribution function. Where such functions or subroutines are required, an appropriate comment has been added to the algorithm.

In a few cases, alternative algorithms from other sources have been added. For instance, three algorithms are included in the file for as66 for calculating the area under the normal curve, and an alternative random number generator is provided in the file for as183. The Applied Statistics algorithm for Nelder-Mead simplex minimization (AS 47) does not include the fitting of a quadratic surface; the CSIRO/ Rothamsted implementation which does this is included with AS 47.

Users must consult the original journal articles for details of the calling arguments; they are not included with these algorithms.

*** Warning. In some cases, there are different arguments (usually more) in the Griffiths & Hill versions of these algorithms. Users should check this.

It has been assumed that the user will be using a Fortran-77 compiler, so functions such as alog and amin1 have been converted to their generic forms (log and min). This simplifies conversion of code between single and double precision. Also all constants in the code, such as 1.0, 0.d0, etc., have been replaced with one, zero, etc., and these are defined in either data or parameter statements. Many of the algorithms have been entered in lower case, which is not acceptable in Fortran, though most compilers accept it.

Some of the algorithms need machine-dependent constants. The user should check this. In most cases, these have been set for compilers which allow a range of floating-point numbers from about 10**(-37) to 10**(+37), though most modern compilers allow a much wider range in double precision.

No guarantee is given that the algorithms have been entered correctly, or that they perform as claimed in the journal.

To obtain an algorithm, send an E-mail request of the form:

send index from apstat send 207 from apstat to statlib@lib.stat.cmu.edu

The Royal Statistical Society holds the copyright to these routines, but has given its permission for their distribution provided that no fee is charged.

The full collection of Applied Statistics algorithms is very large. Please only request those algorithms which you need. Requesting large numbers of algorithms places a great strain on the StatLib system and the underlying mail networks.

As of the end of 1997 the Applied Statistics journal does NOT accept algorithms

Listing of available Applied Statistics algorithms

No.	Brief description (volume number/year in brackets)
3	Student's t-distribution. (17/1968) See also AS 27.
5	The non-central t-distribution. (17/1968) See also AS 243.
6	Cholesky decomposition of a symmetric +ve definite matrix. (17/1968)
7	Inversion of a symmetric matrix stored in triangular form. (17/1968)
13	Minimum spanning tree. (18/1969) See also AS 40.
14	Printing the minimum spanning tree. (18/1969)
15	Single-linkage cluster analysis. (18/1969)
22	Calculate treatment effects in a complete factorial experiment for any numbers of levels of factors using the extended Yate's method. (19/1970)
27	Upper tail area under Student's t-distribution. (19/1970) See also AS 3.
30	Half-normal plotting. (19/1970)
32	The incomplete gamma integral. (19/1970) See also AS 239.
34	Update inverse of symmetric banded matrix. (19/1970)
38	Calculate R-squared for all possible regression subsets using the Gauss-Jordan method. (20/1971) See also AS 268.
40	Update a minimum spanning tree. (20/1971)
41	Updates corrected sums of squares and products matrices. (20/1971) See also AS 240.
45	Histogram plotting. (20/1971)
46	Gram-Schmidt orthogonalization. (20/1971)
47	Nelder & Mead simplex method of unconstrained minimization without requiring derivatives. Does not include the quadratic surface fitting part of the algorithm. A CSIRO/Rothamsted version of the algorithm, which does include fitting a quadratic surface, is also included. (20/1971) (Updated, 20/Dec/93)
51	Log-linear fit for contingency tables. (21/1972) See also AS 160.
52	Calculation of sums of powers (up to 4) of deviations from the mean. (21/1972)
53	Wishart variate generator. (21/1972)
57	Printing multi-dimensional tables. (22/1973)
58	Allocates observations to clusters to minimize within-cluster sum of squares. (22/1973) See also AS 136.
60	Eigenvalues/vectors of a symmetric matrix. (22/1973)
62	Distribution of the Mann-Whitney U statistic. (22/1973)
63	Incomplete beta function. (22/1973) See also TOMS algorithm 708. TOMS algorithms are available from netlib.
64	Inverse of the incomplete beta function ratio. (22/1973) The file here is actually the Griffiths & Hill version of AS 109.
65	Expands structure formula to a list of binary integers. This is actually remark R82 which replaces the original AS65. (39/1990)
66	The normal distribution function. Two other algorithms (not from Applied Statistics) have also been included. (22/1973)
75	Algorithms for least-squares calculation using square-root free planar rotations (Morven Gentleman's package). (23/1974)
76	An integral useful in calculating noncentral t and bivariate normal probabilities. (23/1974)
77	Calculate exact null distribution of the largest root of a beta matrix. (23/1974)
78	The mediancentre (i.e. the median in a multi-dimensional space). (23/1974) See also AS 143.
83	Complex discrete fast Fourier transform. (24/1975)
84	Measures of multivariate skewness and kurtosis. (24/1975)
88	Generate all nCr combinations by simulating nested Fortran DO-loops. (Jane Gentleman's routines). (24/1975) See also AS 172.
89	Tail probabilities for Spearman's rho. (24/1975)
91	Percentage points of the chi-squared distribution. (24/1975)
93	Calculates frequency distribution for the Ansari-Bradley test statistic. (25/1976) A routine has been added to return the distribution function.
95	Maximum likelihood estimation of scale and location parameters from grouped data. User's distribution function. (25/1976)
96	Finding `nice' scales for graphs. (25/1976)
97	Real discrete fast Fourier transform. Series length must be a power of 2. (25/1976) See also AS 117 and AS 176.
99	Fitting Johnson curves by moments. (25/1976)
100	Normal-Johnson and Johnson-Normal transformations. (25/1976)
103	Psi or digamma function. (25/1976)
107	Calculate operating characteristics and average sampling number for a general class of sequential sampling plans. (26/1977)
108	Multiple linear regression minimizing the sum of absolute errors. (26/1977) See also AS 238 (in Pascal).
109	Inverse of the incomplete beta function. (26/1977)
110	LP-Norm fit of straight line by extension of Schlossmacher. Updated 14.06.2013 (26/1977)
111	Percentage points of the normal distribution. (26/1977) See also AS 241.
114	Compute the numerator of certain ordinal measures of association (Kendall's tau, Somer's d, Goodman and Kruskal's gamma) when the data are ordered categories. (26/1977)
116	Calculate the tetrachoric correlation and its standard errors. (26/1977)
117	Fast Fourier transform for series of any length using the CHIRP algorithm. (26/1977)
121	Trigamma function. (27/1978)
123	Distribution function of mixtures of beta distributions. (27/1978)
125	Maximum likelihood estimation for censored exponential survival data with covariates. (27/1978)
126	Distribution function of the range for the normal distribution. (27/1978)
127	Generation of random orthogonal matrices. (27/1978)
128	Computes approximate covariance matrix for normal order statistics. (27/1978)
132	Simple regression minimizing the sum of absolute deviations. (27/1978)
133	Finding the global maximum or minimum of a function of 1 variable. (27/1978)
134	Generate random beta variates for alpha &lt 1 and beta &gt 1. (28/1979)
135	Min-Max (L-infinity) estimates for linear multiple regression. (28/1979)
136	A K-means clustering algorithm. (28/1979)
138	Maximum likelihood estimates of the mean and standard deviation of the normal distribution with censored or confined observations. (28/1979)
139	Maximum likelihood estimation in a linear model from confined and censored normal data. (28/1979)
140	Clustering the nodes of a directed graph. (28/1979)
141	Inversion of a symmetric matrix ignoring a specified row/column. (28/1979)
142	Exact tests of significance in binary regression. (28/1979)
143	Calculates the median centre. (28/1979)
145	Exact distribution of the largest multinomial frequency. (28/1979)
147	Incomplete gamma function. (29/1980) See also AS 239.
148	Removal of bias in the jackknife procedure. (This is actually ASR 62) (29/1980)
149	Amalgamation of means in the case of simple ordering ('Up-and-Down Blocks' algorithm for isotonic regression). (29/1980)
150	Computes estimate of spectrum of a point process using a centered moving average of the periodogram of the counting process. (29/1980)
151	Smoothed spectral estimate for bivariate point processes. (29/1980)
152	Cumulative hypergeometric probabilities. (This is actually AS R77) (29/1980, revised in 38/1989)
153	Distribution of weighted sum of squares of normal variables. (29/1980) Pan's procedure for the tail probabilities of the Durbin-Watson statistic.
154	Exact maximum likelihood estimation of autoregressive-moving average models by Kalman filtering. (29/1980) See also AS 182.
155	Distribution function of a linear combination of non-central chi- squared random variables. (29/1980) See also AS 204. This is a Fortran translation supplied by the author.
157	The runs-up and runs-down tests. (30/1981)
158	Calculation of probabilities for inferences under simple order restrictions. (30/1981) See also AS 198.
159	Generate random 2-way table with given marginal totals. (30/1981)
160	Partial and marginal association in multi-dimensional contingency tables. (30/1981)
161	Critical regions of an unconditional non-randomized test of homogeneity in 2 x 2 contingency tables. (30/1981)
162	Multivariate Conditional Logistic Analysis of Stratum-matched Case-control Studies. (30/1981) Includes a Fortran version of CACM algorithm 382 for generating all combinations of M out of N items.
163	A Givens Algorithm for Moving from one Linear Model to another without Going back to the Data. (30/1981)
164	Least squares subject to linear constraints. (30/1981)
165	Discriminant analysis of categorical data. (30/1981)
166	Calculates the entanglement matrix for a specified design. (30/1981)
167	Calculates efficiencies of estimation and their generalized inverse. (30/1981)
168	Calculates 'neat' values for plotting scales. (30/1981)
169	Produces scatter plots. (30/1981)
170	Computation of probability and non-centrality parameter of a non- central chi-square distribution. (30/1981)
171	Fisher's exact variance test for the Poisson distribution. (31/1982)
172	Generates indices for simulated nested DO-loops. Actually converts (either way) between a single index and a vector of subscripts. (31/1982)
173	Generates design matrix for balanced factorial experiments. (31/1982)
174	Multivariate rank sum test and median test. (31/1982)
175	Cramer-Wold factorization of self-reciprocal polynomials as the product of two polynomials. (31/1982)
176	Kernel density estimation using the fast Fourier transform. (31/1982) Also contains an alternative set of routines for density estimation.
177	Expected values of normal order statistics. (31/1982)
178	Gauss-Jordan sweep operator with multi-collinearity detection. (31/1982)
179	Enumeration of all permutations of multi-sets with fixed repetition numbers. (31/1982)
180	Linear rank estimate of the standard deviation after symmetric trimming. (31/1982)
181	Withdrawn. See R94 below
182	Finite-sample prediction from ARIMA processes. (31/1982) Uses AS 154.
183	The Wichmann & Hill random number generator. An alternative is also provided. (31/1982)
184	Non-central studentized maximum and related multiple-t probabilities. (31/1982)
185	Backward elimination procedure to find best-fitting log-linear models for contingency tables. (31/1982) Uses AS 51.
186	Discrete Fast Fourier Transform with data permutation. Series length must be a power of 2. (31/1982)
187	Derivatives of the incomplete gamma integral. (31/1982)
188	Estimation of the order of dependence in sequences. (32/1983)
189	Maximum likelihood estimation for the beta binomial distribution. (32/1983)
190	Distribution function & its inverse, for the studentized range. (32/1983)
191	Approximate likelihood calculation for ARMA and seasonal ARMA models. Includes a routine for the Banachiewicz (or modified Cholesky) factorization A = LDL'. (32/1983)
192	Calculate approximate percentage points using Pearson curves and the first three or four moments. (32/1983)
193	The Knuth spectral test for congruential random number generators. (32/1983)
194	Test of goodness of fit of ARMA models. (32/1983)
195	Multivariate normal probabilities for a rectangular region in N- dimensional space. A version which calls IMSL routines is also included. (33/1984) See AS 251 for a special case.
196	Conditional multivariate logistic analysis of stratified case-control studies. (33/1984). Edited for errors by Bill Bardsley (bill.bardsley@man.ac.uk) on 18/08/2002.
197	Likelihood function for an ARMA process. (33/1984)
198	Calculation of level probabilities for order-restricted inference. (33/1984)
199	Branch and bound algorithm to find the subset which maximizes a quadratic form. (33/1984)
200	Approximate the sum of squares of normal score. (33/1984)
201	Combine predictions about a statistic based on the orderings of a set of means with an F-test of differences between the means. (33/1984)
202	Data-based nonparametric hazard estimation. (33/1984)
203	Maximum likelihood estimation of mixtures of distributions (normal, exponential, Poisson and binomial). (33/1984) See also AS221. This is a translation from Algol into Fortran.
204	Distribution of a sum of non-central chi-squared variables. (33/1984) Translation from Algol into Pascal.
205	Enumeration of all R x C contingency tables with given row and column totals, and calculation of hypergeometric probability for each table. (33/1984)
206	Isotonic regression in two independent variables. (33/1984)
207	Fit a generalized log-linear model. (33/1984)
208	Fit multivariate logistic model by the method of moments. (34/1985)
209	The distribution function of skewness and kurtosis. (34/1985)
210	Fit 5-parameter Johnson SB curves by moments. (34/1985)
211	The F-G diagonalization algorithm. (34/1985) Modifications in ASR 71 & ASR 74 have been added to the file.
212	Fitting the exponential curve by least squares. (34/1985)
213	Generation of correlation matrices with specified eigenvalues. (34/1985)
214	Calculation of Monte Carlo confidence intervals. (34/1985)
215	Maximum likelihood estimation of the parameters of the generalized extreme-value distribution. (34/1985). Updated on [15/Nov/99]
216	Maximum likelihood fitting when model contains a linear part and auxiliary parameters. (34/1985)
217	Computation of the Dip statistic to test for unimodality. (34/1985). Bug fixed by Ferenc Mechler (fmechler@med.cornell.edu). See README file for details. File changed to incorporate this patch provided by Yong Lu [5/Sep/02] [4/Aug/05]
218	Elements of the Fisher information matrix for the smallest extreme- value distribution, also allows for censored data. (35/1986)
219	Height balanced trees. (35/1986)
220	Operating characteristics of James-Stein and Efron-Morris estimation. (35/1986)
221	Maximum likelihood estimation of a mixing distribution. (35/1986)
222	Resistant smoothing using the fast Fourier transform. (36/1987)
223	Optimum ridge parameter selection. (36/1987)
224	Combining component designs to form a design with several orthogonal blocking factors. (36/1987)
225	Minimizing linear inequality-constrained Mahalanobis distances. (36/1987)
226	Cumulative probabilities for the non-central beta distribution. (36/1987)
227	Generate all possible N-bit binary codes. (36/1987)
228	Finding I-projections (maximizing cross-entropy) subject to a finite set of linear inequality constraints. (36/1987)
229	Quantile regression. (36/1987)
230	Distribution of customers in M/G/m queues using an approximation due to Hokstad. (36/1987)
231	Distribution of a noncentral chi-squared variable with non-negative degrees of freedom. (36/1987) In Pascal.
232	Computation of population and sample correlation and partial correlation matrices in MARMA(p,q) time series. (37/1988)
233	Branch and bound algorithm for feature subset selection. (37/1988)
234	Approximating the percentage points of simple linear rank statistics with Cornish-Fisher expansions. (37/1988)
235	Tally frequencies of distinct real values. (37/1988)
236	Recursive enumeration of RxC tables for exact likelihood evaluation. (37/1988) In Pascal.
237	The corner method for identifying autoregressive moving average models. (37/1988)
238	Recursive procedure for the L1-norm fitting of a straight line. (37/1988) In Pascal.
239	Incomplete gamma function. (37/1988)
240	Updating the inverse of corrected sums of squares and products. (37/1988)
241	Inverse normal - more accurate than AS 111. (37/1988)
242	The exact likelihood of a vector autoregressive moving average model. (38/1989)
243	Cumulative distribution function of the non-central t-distribution. (38/1989)
244	Decomposability and collapsibility for log-linear models. (38/1989) In Pascal.
245	Log of the gamma function. (38/1989) The file includes a slower but more accurate algorithm which is not an AS algorithm.
246	Repeated measurements analysis of variance with unknown autoregressive parameter. (38/1989)
247	Updating the sufficient configurations for fitting ZPA (zero partial association) models to multi-dimensional contingency tables. (38/1989) In Pascal.
248	Empirical distribution function goodness-of-fit tests for the uniform, normal and exponential distributions. (38/1989)
249	Mean and covariance estimation for the truncated multivariate normal distribution. (38/1989)
250	Test the equality of dispersion matrices using methods due to Puri & Sen (distribution-free) and Anderson (normal distribution). (38/1989)
251	Multivariate normal probability integrals with product correlation structure. (38/1989)
252	Generating classes for log-linear models. (39/1990)
253	Maximum likelihood estimation of the RC(M) association model. (39/1990)
254	Fit mixture of normal distributions to grouped & truncated data. (39/1990)
255	Fitting two-way tables by means for rows, columns and cross-term. (39/1990) In Algol.
256	Distribution of a quadratic form in normal variables. (39/1990) In Pascal.
257	Isotonic regression for umbrella orderings. (39/1990)
258	Average run lengths for cumulative sum schemes. (39/1990)
259	Extending confidence intervals by the Robbins-Monro search process. (39/1990)
260	Distribution function of the square (R^2) of the sample multiple- correlation coefficient. (40/1991)
261	Quantiles of the distribution of square (R^2) of the sample multiple- correlation coefficient. (40/1991)
262	The Wei-Lachin two-sample test, the generalized Wilcoxon test and the log-rank test for incomplete multivariate observations with censoring. (40/1991)
263	Construction of irredundant test sets. (40/1991)
264	Printing of bit patterns. (40/1991)
265	Waiting time distribution for the G/G/1 queue using the Fast Fourier transform. (40/1991)
266	Maximum likelihood estimation of the parameters of the Dirichlet distribution. (40/1991)
267	Calculate lower bound for the probability of correct selection of a subset containing the best populations from a set of populations. (40/1991)
268	Calculates statistics for all possible subset regressions using an orthogonal decomposition. (40/1991)
269	Calculates the Cornish-Fisher adjustment to distribution functions (from the normal distribution function) using higher cumulants. (41/1992)
270	Maximum likelihood fitting of a 'key' density function, e.g. normal distribution, multiplied by a series in Hermite polynomials. (41/1992)
271	Optimal (smallest mis-classification probability) joint classification procedure. (41/1992) See also AS 276.
272	Produce character Box-plots using supplied quartiles. (41/1992)
273	Compare the least-squares fit of two subsets of regression variables using Spj0tvoll's method. (41/1992). Updated [04/01/03].
274	Least squares algorithms to supplement those of Gentleman in AS 75. (41/1992)
274-90	Algorithm 274, translated into Fortran 90.
275	Non-central chi-squared distribution function, number of degrees of freedom must be positive but not necessarily integer. (41/1992)
276	Optimal combinatoric classification for two normal classes. (41/1992) See also AS 271.
277	The Oja bivariate median. (41/1992)
278	The distribution of quadratic forms of multivariate generalized Student variables. (41/1992)
279	P-values for the generalized/alternative Durbin-Watson statistic with arbitrary lag using a Cholesky transformation method. (42/1993)
280	The power function for Fisher's exact test. (42/1993)
281	Scaling and rounding regression coefficients to force them to be integers. (42/1993)
282	High breakdown regression and multivariate estimation. (42/1993)
283	Rapid computation of the permutation paired and grouped t-tests. (42/1993) In ANSI Standard C.
284	Null distribution of a statistic for testing sphericity and additivity: a Jacobi polynomial expansion. (42/1993)
285	Calculate multivariate normal probabilities using Monte Carlo methods for a general region defined by a user-supplied function. (42/1993)
286	Estimation of parameters when there are errors in both the predictor variables and the dependent variable using least squares with a general model (includes implicit non-linear regression). (42/1993)
287	Adaptive rejection sampling (to generate random variables) from log-concave density functions. (42/1993)
288	P-value calculation for the generalized two-sample Smirnov tests. The tests are conditional on ties in the pooled sample. (43/1994)
289	Convolves hypergeometric distributions generated by several 2x2 tables (43/1994)
290	Generate a grid of variance ratios for contour plotting of confidence regions for a pair of parameters in non-linear regression. (43/1994)
291	Calculates overlap capability of subsequence SSEQ of length L. (43/1994)
292	Computes Fisher information matrix elements for time (type I) or failure (type II) censored units from the smallest extreme value (sev), largest extreme value (lev), normal or logistic distribution. (43/1994)
293	Convolves conditional distributions generated by several 2xK tables (43/1994)
294	Pascal program for ....?
295	Heuristic algorithm to pick N rows of X out of NCAND to maximize the determinant of X'X, using the Fedorov exchange algorithm. (43/1994)
R94	calculates Shapiro-Wilk normality test and P-value for sample sizes 3 <= n <= 5000 . Handles censored or uncensored data. Corrects AS 181, which was found to be inaccurate for n > 50. (Uses function POLY in AS 181) by Patrick Royston (44/1995)
R96	Interval for trapezoidal rule integration to limit error using moment generating function bound for weighted sum of chi-squared(1) random variables.
297	Nonparametric regression using ultraspherical polynomials.
298	Hybrid minimization routine using simulated annealing and any local minimizer supplied by the user.
300	Efficient algorithm for determining a simulated percentile point with a fixed degree of accuracy.
301	Computes the logarithms of F1, P(H) and P(H').
302	Subroutine to compute a multiple isotonic regression for umbrella ordering with known peak.
303	Generation of ordered multinomial frequencies. Given integers N (N >= 0) and K (K >= 1), generates all possible integer arrays NI of size K, whose elements are non-negative, non-decreasing and sum to N.
304	Fisher's non-parametric randomization test for two small independent random samples.
305	Compute the ARL of a CUSUM mean chart with linear drift.
306	Calculation of a BIB product operation on any two matrices.
307	Calculation of the simplicial depth and the halfspace depth
310	Computes the cumulative distribution function of a non-central beta random variable.
314	Inverts matrix with contents subject to modulo arithmetic.
317	Maximum Likelihood Estimation and Goodness-of-Fit Tests for Mixtures of Distributions. Submitted by Alan Miller (amiller@bigpond.net.au) [31/Oct/02]
319	A program to implement a quasi-newton method. Using new algorithm Varmet July 1994
319	Fortran 90 version of 319.

N.B. This library was maintained by Alan Miller, formerly of CSIRO Division of Mathematics and Statistics, Clayton, Victoria 3169.

Fortran 90 translations of some of the above algorithms can be found at https://jblevins.org/mirror/amiller/#apstat.

Recent algorithms are provided by "T.R.Hopkins" <T.R.Hopkins@ukc.ac.uk>. The StatLib collection of Applied Statistics Algorithms is maintained by Pantelis Vlachos, vlachos@stat.cmu.edu.

Credit where credit is due

If you use an algorithm, dataset, or other information from StatLib, please acknowledge both StatLib and the original contributor of the material.

Notes

Original site: Last modified: Thu Aug 4 09:57:59 EDT 2005 by Pantelis Vlachos

Beliavsky / StatLib

StatLib---Applied Statistics algorithms

Listing of available Applied Statistics algorithms

Credit where credit is due

Notes

See also

About

Languages