There are 10 repositories under distributed-deep-learning topic.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Distributed Keras Engine, Make Keras faster with only one line of code.
Learn applied deep learning from zero to deployment using TensorFlow 1.8+
A Portable C Library for Distributed CNN Inference on IoT Edge Clusters
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
RocketML Deep Neural Networks
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
TensorFlow (1.8+) Datasets, Feature Columns, Estimators and Distributed Training using Google Cloud Machine Learning Engine
WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.
Scalable NLP model fine-tuning and batch inference with Ray and Anyscale
Java based Convolutional Neural Network package running on Apache Spark framework
This repository contains the implementation of a wide variety of Deep Learning Projects in different applications of computer vision, NLP, federated, and distributed learning. These projects include university projects and projects implemented due to interest in Deep Learning.
Distributed deep learning framework based on pytorch/numba/nccl and zeromq.
SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU, Multi-node training for deep learning models. Horovod, NVIDIA clara train sdk, configuration tutorial,performance testing.
Collection of resources for automatic deployment of distributed deep learning jobs on a Kubernetes cluster
PyTorch Examples for Beginners
Horovod Tutorial for Pytorch using NVIDIA-Docker.
Distributed Tensorflow, Keras and BigDL on Apache Spark
Java based Convolutional Neural Network package running on Apache Spark framework
A blockchain based neural architecture search project.
Distributed Deep Learning experiments with the BigDL framework over Databricks
Yelp review classification using CNN model with horovod on HPC cluster
Simultaneous Multi-Party Learning Framework
Comparison of distributed machine learning techniques applied to openly available datasets
Implemented training strategies to help improve bottlenecks and to improve the training speed while maintaining the quality of our GANs.
An implementation of a distributed ResNet model for classifying CIFAR-10 and MNIST datasets.