sungjun lee's starred repositories

ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

llama.cpp

LLM inference in C/C++

llama

Inference code for Llama models

Language:PythonLicense:NOASSERTIONStargazers:56429Issues:525Issues:992

llama-recipes

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:15138Issues:192Issues:378

ZLUDA

CUDA on non-NVIDIA GPUs

Language:RustLicense:Apache-2.0Stargazers:9715Issues:134Issues:178

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4633Issues:47Issues:199

whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone

Language:PythonLicense:MITStargazers:2666Issues:29Issues:165

diff-svc

Singing Voice Conversion via diffusion model

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:2642Issues:131Issues:351

LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Language:PythonLicense:Apache-2.0Stargazers:2409Issues:43Issues:88

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:2039Issues:46Issues:129

magicoder

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

Language:PythonLicense:MITStargazers:1977Issues:25Issues:41

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:1560Issues:17Issues:26

resource-stream

CUDA related news and material links

zsh-in-docker

Install Zsh, Oh My Zsh and plugins inside a Docker container with one line!

Language:ShellLicense:MITStargazers:928Issues:7Issues:17

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonLicense:NOASSERTIONStargazers:654Issues:17Issues:64

KICE_slayer_AI_Korean

수능 국어 1등급에 도전하는 AI

newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

Language:HTMLLicense:MITStargazers:486Issues:7Issues:623

text-clustering

Easily embed, cluster and semantically label text datasets

Language:PythonLicense:Apache-2.0Stargazers:461Issues:35Issues:5
Language:PythonLicense:Apache-2.0Stargazers:450Issues:11Issues:12

honeybee

Official implementation of project Honeybee (CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:399Issues:15Issues:21

doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

Language:HTMLLicense:MITStargazers:304Issues:5Issues:30

OBELICS

Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.

Language:PythonLicense:Apache-2.0Stargazers:189Issues:7Issues:12

xlora

X-LoRA: Mixture of LoRA Experts

Language:PythonLicense:Apache-2.0Stargazers:176Issues:5Issues:17

42dot_LLM

42dot LLM consists of a pre-trained language model, 42dot LLM-PLM, and a fine-tuned model, 42dot LLM-SFT, which is trained to respond to user prompts and supports both languages simultaneously by training a large amount of Korean and English text.

season2

Jiphyeonjeon Season 2

Language:HTMLLicense:NOASSERTIONStargazers:15Issues:1Issues:0

mmlu

Measuring Massive Multitask Language Understanding | ICLR 2021

Language:PythonLicense:MITStargazers:14Issues:0Issues:0

wikipedia-markdown-generator

A simple python script to convert any Wikipedia article to Markdown.

Language:PythonLicense:MITStargazers:12Issues:1Issues:0

text-anonymization

A guide to anonymize text effortlessly using Presidio, an open-source library developed by Microsoft.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5Issues:1Issues:0