zye1996

zye1996

Geek Repo

Company:GMU

Location:Fairfax, VA

Home Page:zye1996.github.io

Github PK Tool:Github PK Tool

zye1996's starred repositories

jieba

结巴中文分词

Language:PythonLicense:MITStargazers:32689Issues:1283Issues:845

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Language:PythonLicense:Apache-2.0Stargazers:24735Issues:168Issues:3985

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:18975Issues:296Issues:1318

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:10570Issues:195Issues:2124

surya

OCR, layout analysis, reading order, line detection in 90+ languages

Language:PythonLicense:GPL-3.0Stargazers:8821Issues:74Issues:90

ai

Build AI-powered applications with React, Svelte, Vue, and Solid

Language:TypeScriptLicense:NOASSERTIONStargazers:8403Issues:59Issues:611

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8125Issues:130Issues:1038

magika

Detect file content types with deep learning

Language:PythonLicense:Apache-2.0Stargazers:7508Issues:36Issues:351

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4154Issues:41Issues:162

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3719Issues:110Issues:68

trt-llm-rag-windows

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

Language:PythonLicense:NOASSERTIONStargazers:2356Issues:46Issues:44

vearch

Distributed vector search for AI-native applications

Language:GoLicense:Apache-2.0Stargazers:1963Issues:76Issues:572

AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration

Language:PythonLicense:Apache-2.0Stargazers:1818Issues:10Issues:19

CDial-GPT

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Language:PythonLicense:MITStargazers:1718Issues:28Issues:108

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language:PythonLicense:AGPL-3.0Stargazers:1651Issues:26Issues:133

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language:PythonLicense:MITStargazers:1407Issues:39Issues:34

conversational-datasets

Large datasets for conversational AI

Language:PythonLicense:Apache-2.0Stargazers:1258Issues:74Issues:30

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonLicense:Apache-2.0Stargazers:1215Issues:17Issues:42

Chinese-medical-dialogue-data

Chinese medical dialogue data 中文医疗对话数据集

Language:PythonLicense:MITStargazers:1044Issues:20Issues:9

PaddleOCR2Pytorch

PaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)

Language:PythonLicense:Apache-2.0Stargazers:804Issues:15Issues:82

CnSTD

CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包

Language:PythonLicense:Apache-2.0Stargazers:645Issues:14Issues:46

DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

MuTual

A Dataset for Multi-Turn Dialogue Reasoning

LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Language:Jupyter NotebookLicense:MITStargazers:264Issues:4Issues:19

clean-dialog

A framework for cleaning Chinese dialog data

gutenberg-dialog

Build a dialog dataset from online books in many languages

Language:PythonLicense:MITStargazers:68Issues:4Issues:2
Language:PythonLicense:MITStargazers:57Issues:1Issues:2

tvsub

TVsub: DCU-Tencent Chinese-English Dialogue Corpus

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:29Issues:1Issues:5