Yen-Ting Lin (adamlin120)

adamlin120

Geek Repo

Company:National Taiwan University

Home Page:yentingl.com

Twitter:@yentinglin56

Github PK Tool:Github PK Tool

Yen-Ting Lin's starred repositories

Open-Reasoning-Tasks

A comprehensive repository of reasoning tasks for LLMs (and beyond)

Language:JavaScriptLicense:Apache-2.0Stargazers:141Issues:0Issues:0

text-clustering

Easily embed, cluster and semantically label text datasets

Language:PythonLicense:Apache-2.0Stargazers:404Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:12722Issues:0Issues:0

TMLU

Taiwanese Mandarin Language Modeling

Language:PythonStargazers:2Issues:0Issues:0

zh-tw-embedding-model-benchmark

使用繁體中文資料集做的 Embedding 模型評測

Language:PythonStargazers:9Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:31Issues:0Issues:0

TWLLM-Tutor

Taiwan-LLM Tutor: Large Language Models for Taiwanese Secondary Education

Language:PythonLicense:MITStargazers:16Issues:0Issues:0

DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

Language:PythonLicense:MITStargazers:754Issues:0Issues:0
Language:PythonStargazers:195Issues:0Issues:0

ai-workshop-code

Code I wrote for my AI & LLM workshops

Language:Jupyter NotebookStargazers:172Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:22438Issues:0Issues:0

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonLicense:MITStargazers:11372Issues:0Issues:0

awesome-synthetic-datasets

awesome synthetic (text) datasets

Language:Jupyter NotebookLicense:CC-BY-SA-4.0Stargazers:193Issues:0Issues:0

cohere-toolkit

Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.

Language:TypeScriptLicense:MITStargazers:2625Issues:0Issues:0

shisa-v2

Japanese / English Bilingual LLM

Language:PythonLicense:Apache-2.0Stargazers:8Issues:0Issues:0

text-dedup

All-in-one text de-duplication

Language:PythonLicense:Apache-2.0Stargazers:562Issues:0Issues:0

SemDeDup

Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).

Language:PythonLicense:NOASSERTIONStargazers:90Issues:0Issues:0

chat-ui

Open source codebase powering the HuggingChat app

Language:TypeScriptLicense:Apache-2.0Stargazers:6940Issues:0Issues:0

axolotl

Go ahead and axolotl questions

Language:PythonLicense:Apache-2.0Stargazers:7148Issues:0Issues:0
Language:PythonLicense:MITStargazers:132Issues:0Issues:0
Language:PythonStargazers:8Issues:0Issues:0

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2066Issues:0Issues:0

composer

Supercharge Your Model Training

Language:PythonLicense:Apache-2.0Stargazers:5080Issues:0Issues:0

lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Language:PythonLicense:MITStargazers:508Issues:0Issues:0
Language:ElixirLicense:Apache-2.0Stargazers:69Issues:0Issues:0

Sensei

Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI

Language:PythonStargazers:213Issues:0Issues:0

ml-engineering

Machine Learning Engineering Open Book

Language:PythonLicense:CC-BY-SA-4.0Stargazers:10343Issues:0Issues:0

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:13552Issues:0Issues:0

tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast

Language:PythonLicense:AGPL-3.0Stargazers:390Issues:0Issues:0

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:28312Issues:0Issues:0