horton2009's repositories
regmix
🧬 RegMix: Data Mixture as Regression for Language Model Pre-training
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
faiss
A library for efficient similarity search and clustering of dense vectors.
abctools
ABC Transcription tools based on abcjs
ChatTTS
ChatTTS is a generative speech model for daily dialogue.
gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
FullLLM
Full stack LLM (Pre-training/finetuning, PPO(RLHF), Inference, Quant, etc.)
llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
BMTools
Tool Learning for Big Models, Open-Source Solutions of ChatGPT-Plugins
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
PGL
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle
ERNIE
An Implementation of ERNIE For Language Understanding (including Pre-training models and Fine-tuning tools)
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
gpt-2-Pytorch
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
12306
12306智能刷票,订票
darts-clone
A clone of Darts (Double-ARray Trie System)
decagon
Graph convolutional neural network for multirelational link prediction
pytorch-cn
Pythrch-CN文档地址
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
torch7
http://torch.ch
python-crfsuite
A python binding for crfsuite
naive-rete
Python RETE algorithm
ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
anthelion
Anthelion is a plugin for Apache Nutch to crawl semantic annotations within HTML pages
test_git
test git for mac
svdlibc
A fork of Doug Rohde's SVD C Library.