Renato Negrinho's starred repositories
devops-exercises
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
gpt-engineer
Specify what you want it to build, the AI asks for clarification, and then builds it.
alpaca-lora
Instruct-tune LLaMA on consumer hardware
gemma_pytorch
The official PyTorch implementation of Google's Gemma models
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
oneAPI-samples
Samples for Intel® oneAPI Toolkits
intel-extension-for-tensorflow
Intel® Extension for TensorFlow*
optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray