YenTing (Adam) Lin (adamlin120)

adamlin120

Geek Repo

Company:National Taiwan University

Home Page:yentingl.com

Twitter:@yentinglin56

Github PK Tool:Github PK Tool

YenTing (Adam) Lin's starred repositories

ai-workshop-code

Code I wrote for my AI & LLM workshops

Language:Jupyter NotebookStargazers:132Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:21068Issues:0Issues:0

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonLicense:MITStargazers:10907Issues:0Issues:0

awesome-synthetic-datasets

awesome synthetic (text) datasets

Language:Jupyter NotebookLicense:CC-BY-SA-4.0Stargazers:176Issues:0Issues:0

cohere-toolkit

Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.

Language:TypeScriptLicense:MITStargazers:2404Issues:0Issues:0

shisa-v2

Japanese / English Bilingual LLM

Language:PythonLicense:Apache-2.0Stargazers:7Issues:0Issues:0

text-dedup

All-in-one text de-duplication

Language:PythonLicense:Apache-2.0Stargazers:529Issues:0Issues:0

SemDeDup

Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).

Language:PythonLicense:NOASSERTIONStargazers:78Issues:0Issues:0

chat-ui

Open source codebase powering the HuggingChat app

Language:TypeScriptLicense:Apache-2.0Stargazers:6691Issues:0Issues:0

axolotl

Go ahead and axolotl questions

Language:PythonLicense:Apache-2.0Stargazers:6681Issues:0Issues:0
Language:PythonLicense:MITStargazers:130Issues:0Issues:0
Language:PythonStargazers:8Issues:0Issues:0

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1974Issues:0Issues:0

composer

Supercharge Your Model Training

Language:PythonLicense:Apache-2.0Stargazers:5056Issues:0Issues:0

lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Language:PythonLicense:MITStargazers:454Issues:0Issues:0
Language:ElixirLicense:Apache-2.0Stargazers:67Issues:0Issues:0

Sensei

Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI

Language:PythonStargazers:194Issues:0Issues:0

ml-engineering

Machine Learning Engineering Open Book

Language:PythonLicense:CC-BY-SA-4.0Stargazers:10112Issues:0Issues:0

unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:11880Issues:0Issues:0

tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast

Language:PythonLicense:AGPL-3.0Stargazers:348Issues:0Issues:0

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Language:PythonLicense:Apache-2.0Stargazers:24911Issues:0Issues:0

dsir

DSIR large-scale data selection framework for language model training

Language:PythonLicense:MITStargazers:198Issues:0Issues:0

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1706Issues:0Issues:0

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:3931Issues:0Issues:0

ocotillo

Performant and accurate speech recognition built on Pytorch

Language:PythonLicense:NOASSERTIONStargazers:238Issues:0Issues:0

DL-Art-School

TorToiSe fine-tuning with DLAS

Language:PythonLicense:AGPL-3.0Stargazers:205Issues:0Issues:0

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:12272Issues:0Issues:0

insanely-fast-whisper

Incredibly fast Whisper-large-v3

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1816Issues:0Issues:0

RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Language:PythonLicense:Apache-2.0Stargazers:4424Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:21547Issues:0Issues:0