model-quantization

There are 0 repository under model-quantization topic.

Efficient-ML / Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
awesome binarized-neural-networks binary-network deep-learning efficient-deep-learning lightweight-neural-network model-acceleration model-compression model-quantization quantization
2266
horseee / Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
compression efficient-llm knowledge-distillation language-model llm llm-compression model-quantization pruning-algorithms
Language:Python 1893
datawhalechina / awesome-compression
模型压缩的小白入门教程，PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases
knowledge-distillation model-compression model-pruning quantization compression kd model-quantization prune neural-architecture-search low-rank-matrix-decomposition svd
336
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm mistral bloom deepseek internlm phi-2 baichuan2 mixtral m2m100 qwen
Language:C++ 249
Efficient-ML / Awesome-Efficient-AIGC
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
aigc awesome diffusion-models distillation efficient-deep-learning generative-model large-language-models llm model-compression model-quantization pruning
199
sayakpaul / Adventures-in-TensorFlow-Lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
tensorflow-2 tensorflow-lite on-device-ml model-quantization post-training-quantization quantization-aware-training pruning model-optimization tf-lite-model inference tf-hub
Language:Jupyter Notebook 173
RodolfoFerro / psychopathology-fer-assistant
[WINNER! 🏆] Psychopathology FER Assistant. Because mental health matters. My project submission for #TFWorld TF 2.0 Challenge at Devpost.
python raspberry-pi google-colab model-quantization tensorflow tflite firebase-realtime-database flask dash dash-bootstrap-components kaggle-api kaggle-dataset assistant-app
Language:Jupyter Notebook 78
htqin / BiBench
[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
benchmark binarization binarized-neural-networks icml-2023 model-compression binary-neural-networks model-quantization
Language:Python 55
htqin / QuantSR
[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.
model-quantization quantized-neural-networks super-resolution
Language:Python 51
nbasyl / OFQ
The official implementation of the ICML 2023 paper OFQ-ViT
icml icml2023 model-compression model-compression-papers model-quantization vision-transformer vision-transformers quantization-awar
Language:Python 33
seonglae / llama2gptq
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
langchain quantization transformers model-quantization rye streamlit-chat cuda chatai chatbot question-answering chatgpt gpt llama-2 llama2
Language:Python 30
HaoranREN / TensorFlow_Model_Quantization
A tutorial of model quantization using TensorFlow
model-quantization tensorflow tensorflow-lite inference-efficiency tflite machine-learning quantization-aware-training
Language:Python 12
wlfeng0509 / Awesome-Diffusion-Quantization
A list of papers, docs, codes about diffusion quantization.This repo collects various quantization methods for the Diffusion Models. Welcome to PR the works (papers, repositories) missed by the repo.
awesome diffusion-models model-acceleration model-compression model-quantization
9
dcarpintero / ai-engineering
AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.
bert chunking embeddings fine-tuning generative-ai huggingface-transformers in-context-learning knowledge-graph langchain large-language-models llama3-1 model-quantization retrieval-augmented-generation self-attention transformer weights-and-biases ai-engineering
Language:Jupyter Notebook 6
frickyinn / BiDense
PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.
model-compression model-quantization
Language:Python 6
medoidai / model-quantization-blog-notebooks
Notebook from "A Hands-On Walkthrough on Model Quantization" blog post.
artificial-intelligence deep-learning machine-learning model-quantization
Language:Jupyter Notebook 4
Model-Quantization
SRDdev / Model-Quantization
Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32).
ml model-quantization quantization
Language:Jupyter Notebook 4
BjornMelin / local-llm-workbench
🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.
context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup
Language:Shell 3
dwain-barnes / LLM-GGUF-Auto-Converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Language:Jupyter Notebook 3
first-coding / VIT
This project distills a ViT model into a compact CNN, reducing its size to 1.24MB with minimal accuracy loss. ONNXRuntime with CUDA boosts inference speed, while FastAPI and Docker simplify deployment.
docker fastapi image-classification knowledge-distillation model-quantization onnx onnxruntime python vision-transformer
Language:Python 2
nnilayy / Spresense
arduino classification embedded-machine-learning model-pruning model-quantization spresense tensorflowlite-for-microcontrollers novel-segmentation-model camera-capture-rate-optimization esp32-websocket-server
Language:C++ 1
SIYAKS-ARES / survival-with-llms
The Ark Project: Selecting the perfect AI model to reboot civilization from a 64GB USB drive. Comprehensive analysis of open-source LLMs under extreme constraints, with final recommendation: Meta Llama 3.1 70B Instruct (Q6_K GGUF). Includes interactive tools, detailed comparisons, and complete implementation guide for offline deployment.
cpu-inference gguf llama-cpp llm meta-llama mixtral model-quantization offline-ai open-source-ai qwen survival-technology civilization-reboot
Language:HTML 1
dslisleedh / NCNet-flax
Unofficial implementation of NCNet using flax and jax
flax jax super-resolution model-quantization
Language:Python 0
satyampurwar / large-language-models
Unlocking the Power of Generative AI: In-Context Learning, Instruction Fine-Tuning, Reinforcement Learning Fine-Tuning, Retrieval Augmented Generation and LangGraph Workflows for AI Agents.
bert few-shot-prompting flan-t5 generative-ai large-language-models low-rank-adaptation peft-fine-tuning-llm prompt-engineering reinforcement-learning-from-human-feedback model-quantization reinforcement-learning-from-ai-feedback instruction-fine-tuning faiss-vector-database huggingface-embeddings langchain rag-pipeline retrieval-augmented-generation ai-agents langgraph rag-chain
Language:Jupyter Notebook 0
xhay-p / ttPG
Torch and Transformers Playground: Learn and Code Deep Learning using PyTorch and HuggingFace Transformers.
deep-learning model-fine-tuning model-quantization natural-language-processing natural-language-understanding nlp pytorch transformers
Language:Jupyter Notebook 0
aashu-0 / FineTuning_GPT2
PyTorch implementation of GPT-2 that loads pretrained weights and enables instruction fine-tuning on the Stanford Alpaca dataset.
deep-learning finetuning-llms gpt-2 llms model-quantization scratch-implementation
Language:Jupyter Notebook
Chenguiti6444 / Vehicle_Detection_and_Classification_using_Deep_Learning
Fine-tuning Pretrained Deep Learning Models to Classify Low Quality Images of Land Vehicles. - Ajustement de modèles de deep learning préentraînés pour classifier des images faible qualité de véhicules terrestres.
classification cnn computer-vision model-quantization python tensorflow
Language:Jupyter Notebook
harshmorya / Assignment__HB1--1
This project explores generating high-quality images using depth maps and conditioning techniques like Canny edges, leveraging Stable Diffusion and ControlNet models. It focuses on optimizing image generation with different aspect ratios, inference steps to balance speed and quality.
ai-art-generator canny-edge-detection controlnet depth-map image-generation image-synthesis latent-diffusion-models model-quantization optimization pytorch stable-diffusion unet
Language:Python
santidrj / model-quantization-aggregation
Replication package for the paper "Aggregating empirical evidence from data strategies studies: a case on model quantization" published in the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).
green-ai model-quantization research-synthesis software-engineering structured-synthesis-method
Language:Python

model-quantization

Efficient-ML / Awesome-Model-Quantization

horseee / Awesome-Efficient-LLM

datawhalechina / awesome-compression

inferflow / inferflow

Efficient-ML / Awesome-Efficient-AIGC

sayakpaul / Adventures-in-TensorFlow-Lite

RodolfoFerro / psychopathology-fer-assistant

htqin / BiBench

htqin / QuantSR

nbasyl / OFQ

seonglae / llama2gptq

HaoranREN / TensorFlow_Model_Quantization

wlfeng0509 / Awesome-Diffusion-Quantization

dcarpintero / ai-engineering

frickyinn / BiDense

medoidai / model-quantization-blog-notebooks

SRDdev / Model-Quantization

BjornMelin / local-llm-workbench

dwain-barnes / LLM-GGUF-Auto-Converter

first-coding / VIT

nnilayy / Spresense

SIYAKS-ARES / survival-with-llms

dslisleedh / NCNet-flax

satyampurwar / large-language-models

xhay-p / ttPG

aashu-0 / FineTuning_GPT2

Chenguiti6444 / Vehicle_Detection_and_Classification_using_Deep_Learning

harshmorya / Assignment__HB1--1

santidrj / model-quantization-aggregation