kurhula / awesome-high-performance-computing

A curated list of awesome high performance computing resources

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A curated list of awesome high performance computing resources.

Table of Contents

General Info

A Few Upcoming Supercomputers

Most Recent List of the Top500 Supercomputers

History

Trends

Software

Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

  • alpaka - The alpaka library is a header-only C++17 abstraction library for accelerator development
  • async-rdma - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
  • CAF - An Open Source Implementation of the Actor Model in C++
  • Chapel - A Programming Language for Productive Parallel Computing on Large-scale Systems
  • Charm++ - Parallel Programming with Migratable Objects
  • Cilk Plus - C/C++ Extension for Data and Task Parallelism
  • Codon - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
  • CUDA - High performance NVIDIA GPU acceleration
  • dask - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
  • DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
  • DeterminedAI - Distributed deep learning
  • FastFlow - High-performance Parallel Patterns in C++
  • Galois - A C++ Library to Ease Parallel Programming with Irregular Parallelism
  • Halide - A language for fast, portable computation on images and tensors
  • Heteroflow - Concurrent CPU-GPU Task Programming using Modern C++
  • highway - Performance portable SIMD intrinsics
  • HIP - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
  • HPC-X - Nvidia implementation of MPI
  • HPX - A C++ Standard Library for Concurrency and Parallelism
  • Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
  • ISPC - An open-source compiler for high-performance SIMD programming on the CPU and GPU
  • Intel ISPC - SPMD compiler
  • Intel TBB - Threading Building Blocks
  • joblib - Data-flow programming for performance (python)
  • Kompute - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
  • Kokkos - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
  • Kubeflow MPI Operator - MPI Operator for Kubeflow
  • Legate - Nvidia replacement for numpy based on Legion
  • Legion - Distributed heterogeneous programming library
  • MAGMA - Next generation linear algebra (LA) GPU accelerated libraries
  • Merlin - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
  • Microsoft MPI - Microsoft's implementation of MPI
  • MOGSLib - User defined schedulers
  • mpi4jax - Zero-copy mpi for jax arrays
  • mpi4py - Python bindings for MPI
  • MPI - OpenMPI implementation of the Message passing interface
  • MPI - MPICH implementation of the Message passing interface
  • MPI Standardization Forum - Forum for MPI standardization
  • MPAVICH - Implementation of MPI
  • NCCL - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
  • cuNumeric - GPU drop-in for numpy
  • stdpar - GPU accelerated C++ from NVIDIA
  • numba - A JIT compiler that translates a subset of Python into fast machine code
  • oneAPI - A unified, multiarchitecture, multi-vendor programming model
  • OpenACC - "OpenMP for GPUs"
  • OpenCilk - MIT continuation of Cilk Plus
  • OpenMP - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
  • PVM - Parallel Virtual Machine: A predecessor to MPI for distributed computing
  • PMIX - Standard for process management
  • Pollux - Message Passing Cloud orchestrator
  • Pyfi - Distributed flow and computation system
  • RAJA - Architecture and programming model portability for HPC applications
  • RaftLib - A C++ Library for Enabling Stream and Dataflow Parallel Computation
  • ray - Scale AI and Python workloads from reinforcement learning to deep learning
  • ROCM - First open-source software development platform for HPC/Hyperscale-class GPU computing
  • RS MPI - Rust bindings for MPI
  • Scalix - Data parallel computing framework
  • Simgrid - Simulate cluster/HPC environments
  • SkelCL - A Skeleton Library for Heterogeneous Systems
  • STAPL - Standard Template Adaptive Parallel Programming Library in C++
  • STLab - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
  • SYCL - C++ Abstraction layer for heterogeneous devices
  • Taichi - Parallel programming language for high-performance numerical computations in Python
  • Taskflow - A Modern C++ Parallel Task Programming Library
  • The Open Community Runtime - Specification for Asynchronous Many Task systems
  • Transwarp - A Header-only C++ Library for Task Concurrency
  • Tuplex - Blazing fast python data science
  • UCX - Optimized production proven-communication framework

Cluster Hardware Discovery Tools

  • cpuid - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features.
  • cpuid instruction note - A detailed note on the CPUID instruction used for processor identification.
  • cpufetch - A simple yet fancy CPU architecture fetching tool.
  • gpufetch - A tool similar to cpufetch, but for fetching GPU architecture.
  • intel cpuinfo - Intel tool providing information about the characteristics of Intel CPUs.
  • Likwid - Provides all information about the supercomputer/cluster.
  • LIKWID.jl - Julia wrapper for LIKWID.
  • openmpi hwloc - Portable Hardware Locality (hwloc) software project.
  • PRK - Parallel Research Kernels - A collection of kernels for parallel programming research.

Cluster Management/Tools/Schedulers/Stacks

  • BeeGFS - A parallel file system designed for performance-critical environments.
  • Bluebanquise - An open-source cluster management tool.
  • Bright Cluster Manager - Software for deploying and managing HPC and AI server clusters.
  • Ceph - An open-source distributed storage system.
  • DeepOps - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters.
  • E4S - The Extreme Scale HPC Scientific Stack - A collection of open-source software packages for HPC environments.
  • Easybuild - A package manager for HPC/supercomputers.
  • Flux framework - A framework for high-performance computing clusters.
  • fpsync - A tool for fast parallel data transfer using fpart and rsync.
  • GPFS - A high-performance parallel file system developed by IBM.
  • Guix - A package manager for HPC/supercomputers.
  • Intel DAOS - A software-defined scale-out object store for HPC applications.
  • LSF - A batch system for HPC and distributed computing environments.
  • Lmod - A Lua-based module system for software environment management on HPC systems.
  • Lustre Parallel File System - A high-performance distributed filesystem for large-scale cluster computing.
  • moosefs - A fault-tolerant, highly available, distributed file system.
  • NetApp - Intelligent data infrastructure for various workloads.
  • OpenHPC - A community-led set of HPC components.
  • OpenOnDemand - A web portal for accessing supercomputing resources.
  • OpenPBS - A software for workload management and job scheduling.
  • OpenXdMod - A tool for managing high-performance computing resources.
  • RADIUSS - Rapid Application Development via an Institutional Universal Software Stack.
  • rocks - An open-source Linux cluster distribution.
  • Ruse - A tool for managing software environments in HPC clusters.
  • SGE - A resource management software for large clusters of computers.
  • Slurm - A cluster management and job scheduling system for Linux clusters.
  • Spack - A package manager for HPC/supercomputers.
  • sstack - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda.
  • Starfish - Unstructured data management and metadata solution for files and objects.
  • Warewulf - An operating system provisioning system and cluster management tool.
  • xCat - A distributed computing management and provisioning tool.
  • XDMoD - An open-source tool for managing high-performance computing resources.
  • Globus Connect - A fast data transfer tool between supercomputers.

HPC-specific Operating Systems

  • Kitten - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications.
  • McKernel - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications.
  • mOS - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors.

Development/Workflow/Monitoring Tools for HPC

  • Apache Airflow - A platform to programmatically author, schedule, and monitor workflows.
  • Apptainer (formerly Singularity) - Container platform designed for scientific and high-performance computing (HPC) environments.
  • arbiter2 - Monitors and protects interactive nodes with cgroups.
  • Charliecloud - Lightweight container solution for high-performance computing (HPC).
  • Docker - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
  • genv - GPU Environment Management for managing and scheduling GPU resources.
  • Grafana - Open-source platform for monitoring and observability, visualizing metrics.
  • grpc - A high-performance, open-source universal RPC framework.
  • HPC Rocket - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines.
  • HTCondor - An open-source high-throughput computing software framework.
  • Jacamar-ci - CI/CD tool designed for HPC and scientific computing workflows.
  • Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications.
  • nextflow - A workflow framework to deploy data-driven computational pipelines.
  • perun - Energy monitor for HPC systems, focusing on performance and energy efficiency.
  • Prefect - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
  • Prometheus - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
  • redun - Workflow engine that emphasizes simplicity, reliability, and scalability.
  • remora - Tool for monitoring and reporting the performance of batch jobs on HPC systems.
  • ruptime - A utility for monitoring the status of computational jobs and systems.
  • Slurmvision slurm dashboard - A dashboard for monitoring and managing Slurm jobs.
  • slurm docker cluster - A Slurm cluster implemented using Docker containers, for development and testing.
  • snakemake - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses.
  • Stui slurm dashboard for the terminal - A terminal-based UI for managing and monitoring Slurm clusters.
  • Vaex - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.

Debugging Tools for HPC

  • ddt - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC.
  • marmot MPI checker - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications.
  • python debugging tools - A collection of tools for debugging Python applications, including pdb and other utilities.
  • seer modern gui for gdb - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals.
  • Summary of C/C++ debugging tools - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments.
  • totalview - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures.

Performance/Benchmark Tools for HPC

IO/Visualization Tools for HPC

  • ADIOS2 - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations.
  • Amira - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources.
  • hdf5 - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.
  • paraview - An open-source, multi-platform data analysis and visualization application.
  • Scientific Visualization Wiki - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications.
  • the yt project - An open-source, Python-based package for analyzing and visualizing volumetric data.
  • vedo - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK.
  • visit - An Open Source, interactive, scalable, visualization, animation and analysis tool.

General Purpose Scientific Computing Libraries for HPC

Misc.

Wikis

Hardware

Interconnects/Topology

CPU

GPU

TPU/Tensor Cores

Many integrated core processor (MIC)

Cloud

Vendors

Articles/Papers

Custom/FPGA/ASIC/APU

Certification

Student Opportunities / Workshops

Other/Wikis

People

Resources

Books/Manuals

Courses

Tutorials/Guides/Articles

Review Papers/Articles

News

Podcasts

Video Presentations/Courses/Channels

Presentation Slides

Building Clusters/Virtual Clusters

Forums

Careers

Membership Clubs

Blogs

Journals

Conferences

Communities/Chat Groups

Twitters

Consulting

Interview Preparation

Organizations

Interesting r/HPC posts

Misc. Wikis

Misc. Papers/Articles

Misc. Repos

Misc. Theses

Misc.

Games

Other Curated Lists

Acknowledgements

This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing

About

A curated list of awesome high performance computing resources