lanl / DnMFk

A C++ framework of Distributed Non-Negative Matrix Factorization implementation to find Latent Dimensionality in Big Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributed Non-Negative Matrix Factorization with Model Determination (DnMFkCPP)

The holistic analysis and understanding of the latent (that is, not-directly observable) variables and patterns buried in large datasets is crucial for data-driven science, decision making and emergency response. Such exploratory analyses require devising unsupervised learning methods for data mining and extraction of the latent features, and non-negative matrix factorization (NMF) is one of the prominent such methods to extract interpretable latent features, with dimensionality reduction for data mining and blind source separation. NMF is based on compute-intense non-convex constrained minimization, which, for large datasets requires fast and distributed algorithms. In practice, identifying the latent features is both difficult and significant for pattern recognition and latent feature analysis, especially for large dense matrices. This software suite introduces a distributed NMF algorithm coupled with distributed custom clustering followed by a stability analysis on dense data, which we call DnMFkCPP, to determine the number of latent variables.

  • Chennupati, G., Vangara, R., Skau, E., Djidjev, H., & Alexandrov, B. (2020). Distributed non-negative matrix factorization with determination of the number of latent features. The Journal of Supercomputing, 76(9), 7458–7488. https://doi.org/10.1007/s11227-020-03181-6

  • Bhattarai, M., Chennupati, G., Skau, E., Vangara, R., Djidjev, H., & Alexandrov, B. S. (2020). Distributed Non-Negative Tensor Train Decomposition. 2020 IEEE High Performance Extreme Computing Conference (HPEC), 1–10. https://doi.org/10.1109/HPEC43674.2020.9286234

  • Nebgen, B., Vangara, R., Hombrados-Herrera, M. A., Kuksova, S., & Alexandrov, B. (2020). A neural network for determination of latent dimensionality in Nonnegative Matrix Factorization. Machine Learning: Science and Technology. https://doi.org/10.1088/2632-2153/aba372

Contributors

Build Procedure and Experiments

  • For building Distributed Non-negative Matrix Factorization k, refer README under distnmfk directory.
  • For instructions to run experiments for Distributed Non-negative Matrix Factorization k, refer README Experiments under distnmfk/experiments directory.

Acknowledgements

This study was funded by U.S. Department of Energy National Nuclear Security Administration under Contract No. DE-AC52-06NA25396 through Los Alamos National Laboratory's Laboratory Directed Research and Development (LDRD) grant 20190020DR.

This software is extension to the distributed NMF repository planc by Ramakrishnan Kannan et al. The license and readme information can be found in planc-master. Please fidn the following references:

  • Ramakrishnan Kannan, Grey Ballard, and Haesun Park. 2016. A high-performance parallel algorithm for nonnegative matrix factorization. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, , Article 9 , 11 pages. DOI: http://dx.doi.org/10.1145/2851141.2851152
  • James P. Fairbanks, Ramakrishnan Kannan, Haesun Park, David A. Bader, Behavioral clusters in dynamic graphs, Parallel Computing, Volume 47, August 2015, Pages 38-50, ISSN 0167-8191. DOI: http://dx.doi.org/10.1016/j.parco.2015.03.002.
  • Kannan, Ramakrishnan. "SCALABLE AND DISTRIBUTED CONSTRAINED LOW RANK APPROXIMATIONS." (Doctoral Disseration) (2016). https://smartech.gatech.edu/handle/1853/54962
  • Ramakrishnan Kannan, Grey Ballard, Haesun Park: MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization. IEEE Trans. Knowl. Data Eng. 30(3): 544-558 (2018). DOI: https://doi.org/10.1109/TKDE.2017.2767592
  • Oguz Kaya, Ramakrishnan Kannan, Grey Ballard: Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization. ICPP 2018: 90:1-90:10. DOI: https://doi.org/10.1145/3225058.3225127
  • Grey Ballard, Koby Hayashi, Ramakrishnan Kannan: Parallel Nonnegative CP Decomposition of Dense Tensors. 25th {IEEE} International Conference on High Performance Computing(HiPC) 2018. DOI: https://doi.org/10.1109/HiPC.2018.00012

LANL C Number

LANL C number: C20028.

The Copyright and Licensing information is found in license.dat

Copyright

© (or copyright) 2020. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

License

This program is open source under the BSD-3 License. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

About

A C++ framework of Distributed Non-Negative Matrix Factorization implementation to find Latent Dimensionality in Big Data

License:Other


Languages

Language:C++ 91.8%Language:C 2.9%Language:CMake 1.7%Language:Shell 1.7%Language:Python 1.2%Language:Makefile 0.7%