Suzie Oh's starred repositories
Perplexica
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Phi-3CookBook
This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
clean-text
🧹 Python package for text cleaning
augmentoolkit
Convert Compute And Books Into Instruct-Tuning Datasets (or classifiers)!
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
textbook_quality
Generate textbook-quality synthetic LLM pretraining data
AutoCrawler
Official implement of paper "AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation"
Autonomous-Agents
Autonomous Agents (LLMs) research papers. Updated Daily.
llm-continual-learning-survey
Continual Learning of Large Language Models: A Comprehensive Survey
llamaduo
This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM. For this project, we have initially chosen Gemini 1.0 Pro for service type LLM and Gemma 2B/7B for small sized LLM model. It now supports other service LLMs such as GPT4 and Claude3.
CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
Vodalus-Expert-LLM-Forge
Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation editor Gradio UI.
nlp-datasets
Curation note of NLP datasets