ashishlal / LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing

LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLM-PowerHouse: A Curated Guide for Large Language Models with Custom Training and Inferencing

Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of LLMs and build intelligent applications that push the boundaries of natural language understanding.

Table of contents

In-Depth Articles

NLP

Article Resources
LLMs Overview πŸ”—
NLP Embeddings πŸ”—
Sampling πŸ”—
Tokenization πŸ”—
Transformer πŸ”—

Models

Article Resources
Generative Pre-trained Transformer (GPT) πŸ”—

Training

Article Resources
Activation Function πŸ”—
Fine Tuning Models πŸ”—
Enhancing Model Compression: Inference and Training Optimization Strategies πŸ”—
Model Summary πŸ”—
Splitting Datasets πŸ”—
Train Loss > Val Loss πŸ”—
Parameter Efficient Fine-Tuning πŸ”—
Gradient Descent and Backprop πŸ”—
Overfitting And Underfitting πŸ”—
Gradient Accumulation and Checkpointing πŸ”—
Flash Attention πŸ”—

Enhancing Model Compression: Inference and Training Optimization Strategies

Article Resources
Quantization πŸ”—
Knowledge Distillation πŸ”—
Pruning πŸ”—
DeepSpeed πŸ”—
Sharding πŸ”—
Mixed Precision Training πŸ”—
Inference Optimization πŸ”—

Evaluation Metrics

Article Resources
Classification πŸ”—
Regression πŸ”—
Generative Text Models πŸ”—

Open LLMs

Article Resources
Open Source LLM Space for Commercial Use πŸ”—
Open Source LLM Space for Research Use πŸ”—
LLM Training Frameworks πŸ”—
Effective Deployment Strategies for Language Models πŸ”—
Tutorials about LLM πŸ”—
Courses about LLM πŸ”—

Cost Analysis

Article Resources
Lambda Labs vs AWS Cost Analysis πŸ”—

Codebase Mastery: Building with Perfection

Title Repository
Instruction based data prepare using OpenAI πŸ”—
Optimal Fine-Tuning using the Trainer API: From Training to Model Inference πŸ”—
Efficient Fine-tuning and inference LLMs with PEFT and LoRA πŸ”—
Efficient Fine-tuning and inference LLMs Accelerate πŸ”—
Efficient Fine-tuning with T5 πŸ”—
Train Large Language Models with LoRA and Hugging Face πŸ”—
Fine-Tune Your Own Llama 2 Model in a Colab Notebook πŸ”—
Guanaco Chatbot Demo with LLaMA-7B Model πŸ”—
PEFT Finetune-Bloom-560m-tagger πŸ”—
Finetune_Meta_OPT-6-1b_Model_bnb_peft πŸ”—
Finetune Falcon-7b with BNB Self Supervised Training πŸ”—
FineTune LLaMa2 with QLoRa πŸ”—
Stable_Vicuna13B_8bit_in_Colab πŸ”—
GPT-Neo-X-20B-bnb2bit_training πŸ”—
MPT-Instruct-30B Model Training πŸ”—
RLHF_Training_for_CustomDataset_for_AnyModel πŸ”—
Fine_tuning_Microsoft_Phi_1_5b_on_custom_dataset(dialogstudio) πŸ”—
Finetuning OpenAI GPT3.5 Turbo πŸ”—
Finetuning Mistral-7b FineTuning Model using Autotrain-advanced πŸ”—
RAG LangChain Tutorial πŸ”—
Mistral DPO Trainer πŸ”—
LLM Sharding πŸ”—
Integrating Unstructured and Graph Knowledge with Neo4j and LangChain for Enhanced Question πŸ”—
vLLM Benchmarking πŸ”—

What I am learning

After immersing myself in the recent GenAI text-based language model hype for nearly a month, I have made several observations about its performance on my specific tasks.

Please note that these observations are subjective and specific to my own experiences, and your conclusions may differ.

  • We need a minimum of 7B parameter models (<7B) for optimal natural language understanding performance. Models with fewer parameters result in a significant decrease in performance. However, using models with more than 7 billion parameters requires a GPU with greater than 24GB VRAM (>24GB).
  • Benchmarks can be tricky as different LLMs perform better or worse depending on the task. It is crucial to find the model that works best for your specific use case. In my experience, MPT-7B is still the superior choice compared to Falcon-7B.
  • Prompts change with each model iteration. Therefore, multiple reworks are necessary to adapt to these changes. While there are potential solutions, their effectiveness is still being evaluated.
  • For fine-tuning, you need at least one GPU with greater than 24GB VRAM (>24GB). A GPU with 32GB or 40GB VRAM is recommended.
  • Fine-tuning only the last few layers to speed up LLM training/finetuning may not yield satisfactory results. I have tried this approach, but it didn't work well.
  • Loading 8-bit or 4-bit models can save VRAM. For a 7B model, instead of requiring 16GB, it takes approximately 10GB or less than 6GB, respectively. However, this reduction in VRAM usage comes at the cost of significantly decreased inference speed. It may also result in lower performance in text understanding tasks.
  • Those who are exploring LLM applications for their companies should be aware of licensing considerations. Training a model with another model as a reference and requiring original weights is not advisable for commercial settings.
  • There are three major types of LLMs: basic (like GPT-2/3), chat-enabled, and instruction-enabled. Most of the time, basic models are not usable as they are and require fine-tuning. Chat versions tend to be the best, but they are often not open-source.
  • Not every problem needs to be solved with LLMs. Avoid forcing a solution around LLMs. Similar to the situation with deep reinforcement learning in the past, it is important to find the most appropriate approach.
  • I have tried but didn't use langchains and vector-dbs. I never needed them. Simple Python, embeddings, and efficient dot product operations worked well for me.
  • LLMs do not need to have complete world knowledge. Humans also don't possess comprehensive knowledge but can adapt. LLMs only need to know how to utilize the available knowledge. It might be possible to create smaller models by separating the knowledge component.
  • The next wave of innovation might involve simulating "thoughts" before answering, rather than simply predicting one word after another. This approach could lead to significant advancements.

Contributing

Contributions are welcome! If you'd like to contribute to this project, feel free to open an issue or submit a pull request.

About

LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

License:MIT License


Languages

Language:Jupyter Notebook 98.9%Language:Python 1.1%