There are 9 repositories under high-dimensional-data topic.
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
A Python toolbox for gaining geometric insights into high-dimensional data
Fast Best-Subset Selection Library
A collection of small-sample, high-dimensional microarray data sets to assess machine-learning algorithms and models.
High-dimensional medians (medoid, geometric median, etc.). Fast implementations in Python.
Poisson pseudo-likelihood regression with multiple levels of fixed effects
A general purpose Snakemake workflow and MrBiomics module to perform unsupervised analyses (dimensionality reduction & cluster analysis) and visualizations of high-dimensional data.
A Toolkit for Interactive Statistical Data Visualization
Deep distance-based outlier detection published in KDD18: Learning representations specifically for distance-based outlier detection. Few-shot outlier detection
Implementation of NEWMA: a new method for scalable model-free online change-point detection
A Python package for hubness analysis and high-dimensional data mining
Statistical quality evaluation of dimensionality reduction algorithms
The DPA package is the scikit-learn compatible implementation of the Density Peaks Advanced clustering algorithm. The algorithm provides robust and visual information about the clusters, their statistical reliability and their hierarchical organization.
An interactive 3D web viewer of up to million points on one screen that represent data. Provides interaction for viewing high-dimensional data that has been previously embedded in 3D or 2D. Based on graphosaurus.js and three.js. For a Linux release of a complete embedding+visualization pipeline please visit https://github.com/sonjageorgievska/Embed-Dive.
A fast high dimensional near neighbor search algorithm based on group testing and locality sensitive hashing
Statistics for high-dimensional data (homogeneity, sphericity, independence, spherical uniformity)
CorBinian: A toolbox for modelling and simulating high-dimensional binary and count-data with correlations
python library to perform Locality-Sensitive Hashing for faster nearest neighbors search in high dimensional data
Hubness analysis and removal functions
t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections
Feature Selection by Optimized LASSO algorithm
MATLAB code for Unsupervised Feature Selection with Multi-Subspace Randomization and Collaboration (SRCFS) (KBS 2019)
Fortran bindings to the FLANN library for performing fast approximate nearest neighbor searches in high dimensional spaces.
A Python package of cooperative co-evolutionary algorithms for feature selection in high-dimensional data.
Sparse and Regularized Discriminant Analysis in R
A simple library for t-SNE animation and a zoom-in feature to apply t-SNE in that region
🧲 Multi-step adaptive estimation for reducing false positive selection in sparse regressions
Code for “MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection“--[IEEE Transactions on Knowledge and Data Engineering (TKDE 24)]
PyTorch-based radio-interferometric imaging reconstruction package with scalable Bayesian uncertainty quantification relying on data-driven (learned) priors
[SIGMOD 2026] DARTH: Declarative Recall Through Early Termination for Approximate Nearest Neighbor Search.
An advanced version of K-Means using Particle swarm optimization for clustering of high dimensional data sets, which converges faster to the optimal solution.