Suzie Oh (ohsuz)

ohsuz

Geek Repo

Home Page:ohsuz.dev

Github PK Tool:Github PK Tool


Organizations
bcaitech1
DSBA-Lab
Fashion-Reader
HAE-RAE
MINIONS-KR
TEAM-IKYO
team-vvave
wisdomify

Suzie Oh's starred repositories

Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

Language:TypeScriptLicense:MITStargazers:12411Issues:93Issues:211

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:11357Issues:93Issues:310

storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Language:PythonLicense:MITStargazers:9867Issues:69Issues:76

WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

outlines

Structured Text Generation

Language:PythonLicense:Apache-2.0Stargazers:7936Issues:47Issues:533

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonLicense:MITStargazers:6506Issues:39Issues:932

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1858Issues:44Issues:107

Phi-3CookBook

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.

Language:Jupyter NotebookLicense:MITStargazers:1578Issues:12Issues:47

clean-text

🧹 Python package for text cleaning

Language:PythonLicense:NOASSERTIONStargazers:941Issues:14Issues:29

augmentoolkit

Convert Compute And Books Into Instruct-Tuning Datasets (or classifiers)!

Language:PythonLicense:MITStargazers:743Issues:17Issues:32

MergeLM

Codebase for Merging Language Models (ICML 2024)

EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Language:PythonLicense:Apache-2.0Stargazers:579Issues:9Issues:39

spRAG

Retrieval engine for unstructured data

Language:PythonLicense:MITStargazers:518Issues:6Issues:8

textbook_quality

Generate textbook-quality synthetic LLM pretraining data

Language:PythonLicense:MITStargazers:470Issues:8Issues:6

kss

KSS: Korean String processing Suite

Language:PythonLicense:BSD-3-ClauseStargazers:403Issues:4Issues:57
Language:PythonLicense:Apache-2.0Stargazers:395Issues:12Issues:10

AutoCrawler

Official implement of paper "AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation"

Language:PythonLicense:Apache-2.0Stargazers:388Issues:10Issues:6

Autonomous-Agents

Autonomous Agents (LLMs) research papers. Updated Daily.

License:MITStargazers:320Issues:26Issues:0

llm-continual-learning-survey

Continual Learning of Large Language Models: A Comprehensive Survey

llamaduo

This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM. For this project, we have initially chosen Gemini 1.0 Pro for service type LLM and Gemma 2B/7B for small sized LLM model. It now supports other service LLMs such as GPT4 and Claude3.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:177Issues:5Issues:7

PruneMe

Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models

CALM-pytorch

Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind

Language:PythonLicense:MITStargazers:153Issues:8Issues:3

Vodalus-Expert-LLM-Forge

Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation editor Gradio UI.

Language:Jupyter NotebookStargazers:143Issues:7Issues:3

nlp-datasets

Curation note of NLP datasets

muse

Let's create synthetic textbooks together :)

Language:PythonLicense:MITStargazers:69Issues:2Issues:6
Language:Jupyter NotebookStargazers:46Issues:1Issues:0

KtrlF

[NAACL 2024] Official repository for "KTRL+F: Knowledge-Augmented In-Document Search"

Language:PythonStargazers:20Issues:1Issues:0